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ABSTRACT 


Algorithm  optimization  can  be  accompliahed  by  an  exhauative  aearch 
over  alternative  algorithma  for  performing  aome  programming  task.  The 
resulting  algorithms  are  optimum  only  with  respect  to  a  program  technology— 
the  particular  set  of  alternatives  investigated.  Thus,  larger  program 
technologies  can  be  expected  to  yield  better  algorithms.  This  thesis 
contributes  to  the  production  of  optimum  algorithms  in  two  ways.  First, 
a  technique  ("loop-fusion")  was  developed  for  producing  new  algorithms 
equivalent  to  old  algrrithms,  and  thus  expanding  program  technologies. 
Second,  a  technique  ("comparison")  la  described  which  reduces  the  effort 
required  by  certain  exhaustive  searches  over  'Veil -structured"  search 
spaces.  These  techniques  are  applied  to  the  production  of  algorithms  for 
evaluating  matrix  arithmetic  expressions  (MAE).  (The  operators,  -I-  and  *, 
in  such  arlthsmtlc  expressions  are  Interpreted  as  matrix  addition  and 
multiplication,  respectively.)  A  method  is  described  for  producing,  for 
any  MAE,  an  algorithm  for  its  evaluation  which  requires  fewest  arrays  for 
holding  N  by  N  matrices,  while  not  requiring  more  execution  time  than  the 
"standard"  MAE  evaluation  algorithm.  Although  the  algorithm-production 
method  used  la  basically  an  exhaustive-search  over  a  large  space  of  pro¬ 
gram  alternatives  for  each  subexpression  of  the  given  MAE,  the  effort 
this  method  requires  grows  only  linearly  with  the  nimber  of  operators  In 
the  given  expression. 
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CHAPTER  I 


1.1  General  Problem  of  Program  Optimization 

Problem  ere  often  presented  to  e  hiian  programmer  in  way*  thich 
allow  a  multitude  of  poaaible  Approaches  to  their  solution.  The  pro¬ 
grammer  must  then  decide  on  aosm  basis,  which  possible  approach  should 
be  used.  Furthermore,  a  given  "program"  is,  if  it  is  at  all  useful, 
normally  destined  to  be  a  sub-program  of  various  larger  programs.  One 
has  no  reason  to  hope  that  die  implementation  of  the  sub-program  is 
the  same  in  the  optimum  implementation  of  each  program  in  which  it  is 
embedded.  Thus,  a  program  cannot  actually  be  optimised  permanently  in 
Isolation.  The  programmer  must  optimise  it  for  the  particular  context 
in  Ahich  it  is  to  be  used. 

One  form  of  the  program-optimisation  problem  can  be  stated  as 
follows: 

Given  a  programing  task  formulated  as  a  desired 
transformation  or  mapping  of  input  data  to  output 
data,  find  that  program  ldiich  "best”  Implements 
the  task. 

A  program  specifies  the  sequence  of  operations  some  processor  Is  to 
perform  in  order  to  accosgilish  the  desired  transformation.  Each 
operation  is  Itself  a  transformation,  irtilch  must  be  drawn  from  a  fixed 
set  of  possible  operations,  the  repertoire  of  instructions  of  the 
given  processor.  When  more  than  one  sequence  of  operations  can  be 
used  to  accomplish  a  given  programming  cask,  such  sequences  are  termed 
equivalent,  as  are  the  programs  which  specify  them.  That  program  is 
"bent"  which,  of  all  equivalent  programs,  minimises  some  program  cost 
function. 

Programs,  in  their  specification  of  operation  sequences,  can  be 
associated  with  "costs"  Ihese  costs  need  not  be  monetary.  In  general, 
they  represent  the  amount  of  some  scarce  resource  used  in  creating,  or 
executing  a  given  program.  Examples  of  costs  Include: 

(1)  Programmer's  time  required  in  creating  a  program 

(2)  Processor  time  used  in  executing  some  program 
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(3)  Processor  memory  space  required- before  the  prograa 
can  be  executed. 

Costs  (2)  and  (3)  are  of  particular  Interest.  These  costs  describe 

the  performance  of  the  final  product— the  program.  We  therefore  define 

a  program  cost  function  to  be  some  non-decreasing  function  of  the  pro-  # 

cessor  time  expended,  and  memory  space  required,  during  the  program's 

execution. 

In  fact,  a  class  of  optimisation  problems  can  be  devised,  depending 
on  the  precise  description  of  the  program  cost  function.  For  example, 
one  can  conceive  of  an  environment  In  which  total  main  memory  space  Is 
limited.  Then  no  program  Is  acceptable  irtilch  requires  more  main  memory 
than  this  amount.  Among  the  equivalent  programs  Which  are  acceptable, 
some  require  least  processor  execution  time.  These  would  then  be  pre¬ 
ferred.  In  place  of  a  single  function  of  memory  space  and  execution 
time,  one  of  the  variables  enters  the  optimization  problem  In  a  constraint. 

While  the  other  makes  up  the  function  to  be  optimised. 

An  equally  valid  description  of  the  progrsm  cost  function  can 
reverse  the  roles  "space"  and  "time"  played  in  the  previous  example. 

That  Is,  we  conceive  of  a  sltutatlon  wherein  constraints  are  placed  on 
the  execution  time  of  some  programming  task,  leaving  us  to  choose  among 
the  equivalent  programs  for  this  task  one  which  uses  least  msamry  space. 

Such  a  sltutatlon  arises  In  certain  "multi-programmed"  computer  systems. 

Which  employ  the  physical  memory  allocation  technique  called  "paging". 

In  these  environments,  potentially,  large  amounts  of  space  are  available, 
at  Increasing  cost  in  "response  time”.  It  would  seem  desirable  In  such 
an  environment  to  choose  a  program  which  uses  least  space,  thlle  re¬ 
quiring  die  processor  time  required  for  execution  to  lie  below  some 
upper  limit. 

The  cost  functions  described  here  all  depend  on  the  measurement  of 
a  program's  execution  time  and  space  requirements.  These  requirements 
depend  on  both  the  program  and  the  particular  Input  data  with  which  that 
program  Is  supplied  in  a  given  execution.  In  general,  a  program  Is 
written  to  apply  to  many  different  selections  of  input  data.  It  seems 


desirable  to  discus*  its  requireaents  for  several  such  input  data 
sets  at  once.  One  convenient  way  to  describe  prograai  behavior  for  large 
classes  of  possible  inputs  is  to  "paraaa trice'4  that  program's  require¬ 
ments—  to  express  th*  space  and  tlae  requireaents  in  teras  of  certain 
"characteristic  numbers"  derivable  frcei  the  data. 

Thus  far,  the  discussion  of  optlai cation  of  prograas  has  iaplled 
a  search  over  all  possible  progrsas  which  specify  a  given  progressing 
task.  Unfortunately,  for  aany  prograaaing  tasks  we  know  of  no  way  to 
characterise  all  prograas  thick  specify  diet  task.  Nevertheless,  aethods 
of  improving  prograas  are  still  desirable. 

For  certain  prograaaing  tasks,  a  niaiber  of  alternative  prograas  are 
known.  A  search  can  be  perforasd  which  is  Halted  to  a  set  of  prograas 
for  a  progressing  task  whir  a  are  derivable  in  sosw  specific  ways.  The 
best  prograa  among  those  considered  will  be  tensed  "optlsss  with  respect 
to  scat  (specified)  technology”,  or  technologically  optima*,  for  short. 
Technologically  optlaun  progrsas  aay  well  yield  near-optlaun  values  for 
the  prograa  criterion  function.  Additional  improvements  can  be  realised 
by  increasing  the  number  of  prograas  derivable,  that  is,  by  expanding 
the  technology. 


1.2  A  Specific  Problem 

The  general  problem  of  prograa  optimisation  tends  to  founder  on  the 
problea  of  progressing  task  representation.  As  presented  to  a  human 
prograaaer,  aany  if  not  most  progressing  tasks  are  not  well-defined.  Not 
only  does  th*  progresaar  often  have  great  scope  in  choosing  solution 
techniques;  often  he  way  choose  the  characteristics  of  the  solution  as 
well.  Such  freedoa  Halts  the  ability  of  computer  optimisation  techniques 
to  derive  equally  satisfying  results.  Because  the  lisdLts  of  acceptability 
of  prograas  are  vague,  and  indeed  only  informally  stated,  solutions 
proposed  by  algorithms  cannot  be  tested  for  acceptability. 

Several  classes  of  well-specified  progressing  tasks  do  exist,  how¬ 
ever.  Bach  "higher- level"  language  construct,  such  as  th*  expression 
in  Algol,  or  the  DO- loop  of  FORTRAN,  specifies  a  prograaaing  task  sosa- 
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vhat  Independently  of  specific  sequences  of  Instruction  on  e  specific 
■•chine.  The  semantics,  or  aeanlng,  of  instances  of  such  constructs 
thus  allow  sore  than  one  program  to  correctly  specify  thst  meaning. 

Furthermore ,  because  many  instances  of  each  constrict  may  be  presented 
to  an  optimisation  (program-choice)  algorithm,  it  is  profitable  to 
derive  such  algorithms.  The  present  work  describes  a  method,  based  ulti¬ 
mately  on  an  exhaustive  search  over  progressing  alternatives,  for  "com¬ 
piling"  or  "translating"  one  high-level  language  constructs  the  "matrix 
arithmetic  expression".  Several  authors  have  advocated  the  addition  of 
matrix  arithmetic  capabilities  to  various  programing  languages.  The 
matrix  arithmetic  expression  provides  a  basic  construction  for  specifying 
such  arltfasMtic. 

The  ayntax  of  a  matrix  arithmetic  expression  can  be  taken  to  be 
that  of  am  Algol  expression  whose  <varlableC>  are  all  <array  iden¬ 
tifier^,  and  those  operators  are  restricted  to  '+'  and  In  these 

expressions,  +  and  *  designate  matrix  addition  and  multiplication. 

We  cm  successfully  cosqille  technologlcally-optimal  programs  for  Instances 
of  a  sub-class  of  all  expressions.  Our  optimisation  algorithm  requires  that: 

(1)  All  variables  of  the  expression  must  be  N-by-N  (square) 
arrays; 

(.2)  The  expression  must  be  "fully  parenthesized";  and 

(3)  The  expression  may  not  contain  common  subexpressions. 

A  fully  parenthesized  expression  syntactically  describes  exactly  one 
decosqposltlon  of  the  expression  into  one-operator  subexpressions.  Thus, 
ws  will  amice  no  attempt  to  employ  the  associative  and  distributive  lavs 
of  matrix  algebra  to  derive  equivalent  expressions.  A  common  subexpres¬ 
sion  is  a  subexpression,  more  than  one  instance  of  vhlch  occurs  in  die 
expression.  Thus  'A4B'  is  a  comon  subexpression  of  the  expression 

(A+B)  *  (A+B). 

We  will  also  assume  that  the  matrices  we  deal  with  are  "general",  so 
that  no  special  space-saving  storage  techniques  are  possible. 

We  will  describe  a  particular  "technology"  for  programs  vhlch  eval- 
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uate  matrix  arithmetic  expressions.  From  this  technology,  an  expanded 
technology  can  be  developed,  using  a  technique  ihich  may  veil  be  useful 
for  creating  nev  equivalents  to  programs  for  other  programing  tasks. 

A  search-procedure  Is  developed  over  the  programs  in  the  expanded  matrix 
arithmetic  expression  technology.  Ihis  search -procedure  accepts  only 
those  programs  tfiose  time -requirement  Is  no  greater  than  that  required 
by  the  "standard"  technology.  It  searches  for  a  program  tdiose  memory- 
space  requirement  Is  least.  The  search-procedure  Is  shown  to  require 
compile -time  computation  resources  which  Increase  exponentially  with  the 
number  of  operators  in  the  given  expression.  Finally,  a  general  tech¬ 
nique  for  reducing  the  effort  of  any  "structured"  exhaustive  search  Is 
developed,  and  applied  to  the  present  search,  reducing  conpile-time  effort 
to  linear  dependence  on  the  number  of  operators  in  the  given  expression. 

From  the  (Informal)  semantics  of  matrix  arithmetic  expressions.  It 
should  be  clear  that  techniques  known  for  "compiling"  scalar  expressions 
are  applicable  to  matrix  arithmetic  expresslona.  Techniques  are  avail¬ 
able  which  "compile"  arbitrarily  complex  scalar  expressions  Into  Instances 
of  a  small  number  of  basic  assignment  statements.  For  example,  basic 
assignment  statements  for  scalar  arithmetic  expressions  with  the  syntax 
of  matrix  arithmetic  expressions  are: 

A  B  +  C  and  A  ^  B  ^  C . 

A  compilation  technique  based  only  on  the  syntax  of  the  given  expression 
can  resolve  any  expression  into  a  sequence  of  systematic  substitution 
Instances  of  the  above  assignments.  (A  systematic  substitution  replaces 
each  variable  name  In  an  assignment  with  nev  nasms,  chosen  so  diet  die 
nev  name  for  the  left-side  variable  does  not  agree  with  the  nev  name 
of  any  other  right-side  variable.  Thus, 

A  «-X  *  Y  and  Z  *-Q  *  W 

are  systematic  substitution  Instances  of  A  <-B  *  C,  but 

A  «-  A  *  X  and  Z  «-  Q  *  Z 

are  not.) 

Nev  variables  can  be  chosen  to  hold  the  values  of  subexpressions 
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until  these  values  are  input  to  a  later  assignment.  Thus,  the  func¬ 
tional  composition  of  the  binary  addition  and  multiplication  functions 
making  up  the  given  expression  can  be  achieved.  Matrix  arithmetic  ex¬ 
pressions  can  be  compiled  in  exactly  the  way  scalar  expressions  are 
compiled,  yielding  sequences  of  systematic  substitution  instances  of 
the  (syntactically)  same  basic  assignment  statements.  For  each  of  die 
baslx  matrix  assignment  statements,  an  algorithm  can  be  devised.  Thus, 
a  compilation  Into  basic  assignments  serves  to  produce  an  algorithm  for 
computing  the  expression. 

In  die  case  of  scalar  expressions,  relatively  little  memory  space 
is  required  to  hold  each  Intermediate  result.  Accordingly,  many  com¬ 
pilers  make  no  attempt  even  to  re-use  variables  used  for  intermediate 
results. 

Matrix  arithmetic  presents  some  motivation  for  space-optimal  com¬ 
pilation.  In  compiling  matrix  arithmetic  expressions  using  this  tech- 
2 

nlque,  a  set  of  N  variables  must  be  allocated  to  hold  each  Intermediate 
result.  Because  N  may  well  be  large,  a  significant  amount  of  iemory 
could  be  demanded  by  the  compiler  for  intermediate  matrices.  Accordingly, 
some  effort  by  the  compiler  to  reduce  the  storage  space  it  allocates 
for  the  compiled  program  is  desirable. 

In  the  discussion  of  the  general  problem  of  program  optimization, 

it  was  noted  that,  realistically,  program  time  and  space  requirements 

should  be  parametrized.  Both  the  time  and  space  required  for  computing 

matrix  arithmetic  expressions  can  be  regarded  as  functions  of  N,  where 

each  matrix  entering  the  expression  is  N-by-N.  Thus,  each  set  of  varl- 

2 

ables  capable  of  holding  a  matrix  (called  a  2-array)  must  include  N 
variables.  The  time  requirement  for  computing 

A  «-B  *  C. 

can  be  stated  as:  N3  additions,  and  N3  multiplications,  then  one  algor - 
ltm  is  used,  or 

3N3 /l  additions,  and  N3 /2  multiplications 
when  an  algorithm  recently  discovered  by  S.  Winograd  [14]  is  employed. 
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In  general,  time  end  space  requirements  for  computing  a  matrix 
arithmetic  expression  are  polynomials  in  N,  For  large  enough  N,  the 
leading  term  of  w  polynomial  in  N  dominates  the  polynomial,  in  the 
sense  that  the  contribution  of  all  other  terms  are  negligible  with  re¬ 
spect  to  it*  (The  leading  term  of  a  polynomial  in  N  is  that  term  in 
idilch  N  has  the  largest  exponent.)  Accordingly,  we  will  approximate 
time  and  space  requirements  by  the  leading  term  of  their  representation 
as  polynomials  in  N. 

The  space-requirement  for  a  program  to  calculate  a  given  matrix 
arithmetic  expression  has  several  components,  each  corresponding  to  a 
term  in  the  polynomial  in  N.  Since  we  have  stated  that  certain  terms 
of  the  polynomial  will  be  Ignored  in  our  optimization,  it  seems  worth¬ 
while  mentioning  the  program  entities  to  which  they  correspond. 

The  leading  term  of  the  space-polynomial  clearly  counts  the  number 
of  2-arrays  required.  The  linear  term  in  N  measures  the  number  of 
1-arrays,  each  capable  of  holding  a  1-by-N  or  N-by-1  matrix,  or  vector, 
of  values.  The  program  itself  is  represented  in  memory  at  execution 
time.  However,  its  size  is  Independent  of  N,  since  we  will  implement 
it  by  means  of  "loops".  The  program-size  thus  enters  the  constant 
term  of  the  space  polynomial.  Thus,  the  space  requirement  for  a  matrix 
arithmetic  expression's  evaluation  is  dominated  by  the  number  of  2-arrays 
needed  for  that  evaluation. 

We  will  seek  a  techno  logical  ly-minimum-  space  program.  The  program- 
class  we  search  will  Include  only  programs  whose  time-requirement  is 
smaller  than  a  time-standard ,  derivable  from  the  given  expression. 

Two  methods  for  computing  matrix  multiplication  have  been  alluded 
to.  One  the  set  of  iP  algorithms,  requires  scalar  additions  and 
multiplications  to  perform  a  matrix  multiplication;  the  other,  the  set 
of  /2  algorithms,  requires  3N^/2  additions  and  N^/2  multiplications. 
Suppose  all  basic  matrix  multiplication  statements  in  the  "standard" 
compilation  of  some  expression  is  implemented  by  the  same  algorithm, 
and  that  no  element  of  any  subexpression  is  recomputed  during  the 
expression's  evaluation.  Then  the  number  of  scalar  additions  and  mul- 
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triplications  needed  £or  this  no-unnecessary-computatlon  implementation 
forms  a  reasonable  upper  bound  on  the  time  requirement  of  an  "acceptable" 
program. 

We  will  seek  a  program  for  evaluating  a  given  matrix  expression 
Whose  space -requirement  is  least,  subject  to  the  requirements  that  the 
program 

(1)  be  generatable  by  techniques  to  be  presented,  and 

(2)  requires  no  more  time  than  a  no-unnecessary-computation 
sequence  of  one-operator  basic  assignments. 

Programs  tfilch  satisfy  (2)  are  termed  minimum-connection- time 
programs.  We  are  seeking  a  program  whose  space  requirement  is  least, 
tfiere  we  approximate  a  program's  space  requirement  by  the  number  of 
2-arrays  it  uses.  The  2-array  requirements  of  two  programs  will  thus  be 
compared  in  the  course  of  the  search  outlined.  However,  we  can  show  that 
certain  2-arrays  are  needed  by  any  program  to  evaluate  a  given  matrix 
expression.  These  2-arrays  hold  the  matrices  which  are  input  to  the 
expression.  These  variables  must  remain  present  and  undisturbed  through¬ 
out  the  expression's  evaluation.  Thus,  input  2-arrays  do  not  affect 
the  comparison  of  two  programs.  In  effect,  only  non-input  2-arrays  need 
be  counted  in  the  program  criterion  function.  These  non-input  2-arrays 
will  be  termed  "intermediate  2-arrays". 

We  will  demonstrate  an  expansion  of  the  basic  compilation  technology 
which  will  Introduce  more  "basic"  matrix  assignment  statements.  These 
statements  will  contain  more  than  one  operator.  In  fact,  they  are  de¬ 
rived  by  substituting  the  expressions  of  basic  asslgnement  statements  for 
the  variables  in  other  assignments.  Corresponding  to  each  new  assignment, 
we  will  show  how  an  algorithm,  called  a  matrix  elementary  algorithm,  (MEA), 
can  be  constructed,  having  the  following  properties: 

(1)  Its  time  requirement  is  the  same  as  that  of  the  sequence  of 
basic  one-operator  assignments  from  which  its  expression 
was  derived} 

(2)  It  requires  only  k*N  intermediate  variables  for  its  eval¬ 
uation.  Here,  k  does  not  depend  on  N. 
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These  Algorithms  are  created  from  sequences  of  basic  (1-operator) 
assignment  algorithms  by  a  process  called  "loop- fusion".  Basically, 
this  process  allows  small  portions  of  the  matrix  which  represents  the 
value  of  a  subexpression  to  be  cooputed,  and  then  used  at  once  In 
computing  a  portion  of  the  expression  enclosing  that  subexpression. 

This  portion  of  the  Intermediate  result  need  not  be  retained  longer. 

The  variables  used  to  hold  this  part  of  the  Intermediate  result  can 
then  be  used  to  hold  another  portion  of  the  intermediate  result.  The 
fusion  may  thus  require  as  few  as  k*N  Intermediate  variables. 

The  technique  of  loop  fusion  may  permit  combining  two  loops 
which  are  not  part  of  matrix  arithmetic  algorithms.  One  of  the  prin¬ 
cipal  results  of  this  thesis  is  a  set  of  sufficient  conditions  under 
triilch  two  loops  may  fuse  Into  one  computationally  equivalent  loop.  Any 
technique  for  generating  equivalent  programs  Increases  the  set  oi  pro¬ 
grams  over  thlch  a  technologically  optimizing  algorithm  may  search. 

Thus,  loop  fusion  holds  potential  for  Improving  programs  for  tasks  other 
than  evaluation  of  matrix  arithmetic  expressions. 

Loop  fusion  permits  generation  of  a  potentially  Infinite  number  of 
matrix  elementary  algorithms  (MEA's),  However,  the  syntax  of  their 
associated  expressions  Is  sore  restrictive  than  the  syntax  of  matrix 
arithmetic  expressions.  Not  every  matrix  arithmetic  expression  can  be 
evaluated  using  a  single  matrix  elementary  algorithm.  As  a  result, 
techniques  are  needed  for  deciding  just  which  MEA's  should  be  used  to 
evaluate  each  subexpression  of  a  given  matrix  arithmetic  expression. 

The  optimum  decomposition  of  matrix  arithmetic  expressions  Into 
expressions  ihlch  MEA's  arc  capable  of  evaluating  can  not  be  decided  apart 
from  the  given  expression.  In  other  words,  no  one  MEA  Is  obviously 
better  than  another,  for  all  expressions.  For  example.  It  might  be 
supposed  that  the  larger  the  expression  e  value  table  by  an  MEA,  the  better 
that  MEA  is.  The  Intermediate  variable  requirements  of  a  "large"  MEA 
are  no  worse  that  that  of  a  smaller  ICA.  In  seem  sense,  the  overhead 
of  computing  the  large  expression  can  be  apparently  allocated  over 
more  operators,  reducing  the  per-operator  storage  costs.  Unfortunately, 
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Urge-expression  MEA's  have  another  property  tftilch  limits  their  use¬ 
fulness:  every  Input  of  an  MEA  must  be  present  simultaneously.  Thus, 
a  large-expression  MEA  whose  Inputs  are  all  Intermediate  results  requires 
more  Intermediate  2-arrays  to  be  present  than  a  •  lml Ur  small-expression 
MEA,  Nonetheless,  when  a  Urge -express  ion  MEA's  Inputs  correspond 
primarily  to  Inputs  to  the  given  expression,  Its  use  Is  desirable. 

An  algorithm,  called  the  "leaves-ln  algorithm",  was  devised  for 
generating  all  the  possible  decompositions  of  a  given  matrix  arithmetic 
expression  Into  MEA's,  This  algorithm  is  "efficient"  In  the  sense  that 
It  never  re-generates  an  MEA  used  to  evaluate  a  particuUr  subexpression, 
as  the  evaluation  rules  for  other  subexpressions  are  varied.  Instead, 
all  possible  MBA-decooqposltlons  which  can  be  used  to  evaluate  a  sub¬ 
expression  are  retained  In  memory,  and  combined  with  the  MEA-decomposl- 

tlons  for  evaluating  other  subexpressions  to  generate  new  MEA-decomposl- 
tlons.  Unfortunately,  the  "effort"  (cosqmtatlon  time)  required  by  this 
algorithm  was  found  to  grow  exponentially  with  the  number  of  operators 
In  the  expression,  for  certain  expressions.  This  potentially  Urge  effort 
Is  undesirabU,  since  It  makes  the  cost  of  obtaining  a  technologically 
optimum  program  unreasonably  Urge. 

A  general  technique  for  reducing  the  effort  required  by  certain 
exhaustive  researches  was  devised,  and  applied  to  the  leaves-ln  algorithm. 
The  technique  Is  not  'heuristic",  in  that  no  chance  of  missing  an  optimum 
solution  is  Introduced  by  Its  use.  Furthermore,  the  technique  may  well 
be  useful  In  reducing  the  effort  required  by  other  exhaustive  search 
procedures. 

Roughly,  In  any  exhaustive  search,  "states"  of  the  search  are  pro¬ 
duced.  Often,  not  all  varUbles  In  the  state-vector  are  computed  simul¬ 
taneously,  We  can  speak  of  a  "partial  state",  which  represents  the 
situation  obtained  when  not  all  state  variables  are  given  values.  A 
partial  state  may  lead  to  any  of  a  large  number  of  cosq>lete  states, 
depending  on  the  assignments  mode  to  the  variables  not  assigned  values 
in  the  partial  state.  A  particular  set  of  valued  assigned  to  the  non- 
partial-state  variables  will  be  termed  a  "completion". 
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Mo «t  of  the  effort  in  an  exhaustive  search  involves  generating  all 
the  possible  completions  of  each  partial  state.  Nov  suppose  two  partial 
states  are  known  such  that  the  same  variables  are  fixed  (to  different 
values)  in  each,  and  such  that  any  completion  of  one  is  a  possible  com* 
p  let  ion  of  the  other.  Thus,  if  A  and  B  are  partial  states,  and  C  is  a 
given  "completion",  if  A  U  C  (read  "A  completed  by  C")  is  valid,  eo  is 
B  U  C.  Also,  suppose  N(S)  is  the  value  of  a  state,  and  we  seek  a 
state  of  minimum  value.  If  for  all  completions  C,  N(A  U  C)  >  N(B  U  C), 
then  completions  of  partial-state  A  need  not  all  be  examined.  For  every 
complete  state  A  U  C  generated  by  any  completion  C,  there  is  a  complete 
state  B  U  C  generated  by  that  seme  C  which  is  better.  Now,  notice  that 
the  stataaent 

(1)  VC  [N(A  U  C)  >  N(B  U  C)] 

is  a  predicate  on  A  and  B  which  is  independent  of  C.  If  we  can  discover 
a  predicate  equivalent  to  (1)  whose  evaluation  does  not  require  the 
generation  of  all  possible  completions  C,  we  can  compare  partial  states 
using  it.  The  resulting  algorithm  may  reduce  the  number  of  states  gen¬ 
erated  tremendously. 

The  power  of  the  technique  described  depends  on  several  properties 
of  the  space  searched,  and  the  variables  chosen  to  describe  states  in 
that  space.  In  some  searches,  the  comparison  may  lack  "power”— for 
example,  it  may  only  hold  between  identical  partial  states.  In  other 
searches,  few  pairs  of  partial  states  which  yield  true  for  the  value  of 
the  comparison  may  ever  be  generated.  Nonetheless,  in  some  searches  over 
"well-structured"  spaces,  such  comparisons  may  drastically  reduce  the 
search  effort. 

In  the  search  for  the  best  MEA-decomposltlon  of  a  given  matrix 
arithmetic  expression,  the  comparison  theorem  proved  quite  useful. 

Here,  a  "partial  state"  corresponds  to  a  particular  decomposition  of 
one  subexpression.  A  "state"  corresponds  to  a  particular  decomposition 
of  the  entire  given  expression.  A  partial  state  may  be  conqileted  by 
any  decomposition  of  some  subexpression  not  part  of  the  partial  state's 
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subexpression.  States  arise  from  fusions  of  the  MEA's  used  In  computing 

o 

subexpressions.  The  evaluation  rule  for  states  depends  on  components 
In  the  partial  state  and  In  die  completion. 

The  evaluation  function  of  an  MEA  depends  on  the  set  of  subexpressions 
vhose  values  are  Inputs  to  this  MEA.  The  number  of  Intermediate  2-arrays 
needed  to  compute  each  Input  Is  used  to  determine  the  number  needed  to 
compute  the  MEA's  result.  An  MEA  A  which  Is  extended  by  loop  fusion 
becomes  an  MEA,  A  U  C,  whose  Inputs  Include  all  Inputs  of  A,  as  well  as 
additional  Inputs,  C.  These  new  Inputs,  adjoined  to  the  Input  sets  of 
two  different  algorithms,  may  completely  change  the  relative  space-effi¬ 
ciency  of  the  algorithms. 

We  were  able  to  discover  just  when  two  MEA-decomposltlons  for  evalu¬ 
ating  some  subexpression  were  Interchangeable .  By  Interchangeable,  we 
mean  that  any  valid  completion  of  one  Is  a  valid  completion  for  the  other. 
Furthermore,  we  were  able  to  discover  the  evaluation-rule,  N(S),  for 
MEA-decomposltlons  S.  Also  a  predicate  equivalent  to  (1)  for  this  eval¬ 
uation  rule  was  discovered.  The  application  of  this  comparison  predicate 
to  the  leaves-ln  algorithm  reduces  the  effort  required  to  a  value  propor¬ 
tional  to.  rather  than  exponential  with,  the  number  of  operators  In  the 
given  expression. 

1.3  Prior  Work 

Several  aspects  of  the  prior  art  should  be  discussed.  Some  re.  tits 
have  been  published  relating  to  the  general  problem  of  program  optima  a- 
tlon.  Also,  various  authors  have  attacked  specific  problems  In  this  area. 
We  freely  admit  to  being  Influenced  by  their  approaches.  Some  previous 
work  has  been  directed  at  the  production  of  optimum  compilations  for 
scalar-expressions,  work  whose  basic  techniques  we  build  on.  Finally,  some 
work  on  optimum  compilations  of  matrix  arithmetic  expressions  has  been 
published,  and  should  be  mentioned  here. 

One  of  the  major  forerunners  of  the  approach  we  employ  here  seems 
to  have  been  Simon's  "Heuristic  Compiler"  [9].  Simon's  work  appears  to 
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have  at  lta  goal  the  production  of  uome  program  to  accomplish  a  given 
programing  task;  however,  he  appears  to  have  been  one  of  the  first  to 
describe  a  wide  variety  of  progtaamlng  tasks  in  such  e.  way  that  a  space 
of  program  alternatives  for  their  accomplishment  could  be  visualized. 

His  "before-and-after"  description  of  procedure  operations  fonts  such  a 
"state  description"  of  programs.  Indeed,  he  explicitly  mentions  the 
possibility  that  "there  will  generally  be  many  programs  (not  all  equally 
efficient  or  elegant)  that  will  do  the  same  work"  [9,  pg.  6].  This  very 
naturally  suggests  a  search  for  the  most  elegant,  or  efficient. 

Several  authors  have  attacked  problems  in  the  general  area  of  op¬ 
timizing  the  compilation  of  specific  language-constructs.  Notably, 
Relnwald  and  Soland  [7,8]  have  discussed  at  length  the  problem  of  con¬ 
verting  "Decision  tables"  into  optimal  computer  programs.  Interestingly, 
they  adopt  an  approach  based  on  an  exhaustive  search  over  certain  pro¬ 
gram  variations.  They  advocate  use  of  "branch  and  bound"  techniques  for 
reducing  the  space  searched.  Furthermore,  they  suggest  that  the  space 
of  programs  they  search  exhausts  the  space  of  all  programs  \diich  can 
be  said  to  be  "translations"  of  a  given  decision  table.  Thus,  the  pro¬ 
grams  they  produce  are  claimed  to  be  time-optimal,  or  space-optimal, 
and  they  even  propose  means  for  locating  optimal  programs  whose  criterion 
function  Is  a  linear  combination  of  meamry  space  and  execution  time. 

Another  group  of  problems  has  been  attacked  by  Wlnograd  [11,12,13]. 
Winograd  treats  both  problems  of  designing  minimal-time  hardware  for 
performing  certain  computer  instructions,  and  that  of  designing  minimal- 
operation-cost  algorithms  for  performing  certain  operations.  For  the 
most  part,  Wlnograd  concerns  himself  with  deriving  theoretical  lower 
bounds  on  the  "time"  required  for  certain  computer  operations.  In  fact, 
he  usually  also  demonstrates  procedures  tfilch  yield  near -min  lmal-tlme 
operations.  Although  his  approach  is  not  constructive,  nevertheless 
his  search  for  theoretical  lower-bounds  on  quantities  we  attempt  to  min¬ 
imize  Is  certainly  relevant.  Indeed  one  of  his  results  (ln[14J)  directly 
concerns  matrix  multiplication.  Here,  he  presents  an  algorithm  for  cal¬ 
culating  the  dot-product  of  two  vectors  which,  when  applied  to  matrix 
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multiplication,  reduces  the  number  of  multiplications  required  from  n 

3 

to  approximately  n  /2.  He  present  a  derivation  of  this  result  in 
Appendix  I,  together  with  algorithms  which  implement  it,  and  which  can 
be  used  as  "basic  algorithms"  for  the  matrix  assignment  A  4-B  *  C. 

Of  sure  direct  relevance  to  die  compilation  of  matrix  expressions 
is  an  algorithm  developed  for  space-optimal  compilation  of  scalar 
arithmetic  expressions,  and  described  by  I.  Nakata  [5].  This  algorithm 
is  based  on  an  analysis  of  an  expression  Into  a  data-flow  diagram,  a 
precedence-graph  showing  the  necessary  time-sequence  of  subexpression  com¬ 
putation.  Ibis  structure  inspired  the  analysis  of  the  leaves-ln  algorithm. 
Nakata' s  algorithm  which  we  describe  briefly  here,  produces  a  linear  order 
for  the  evaluation  of  subexpressions.  This  order,  of  all  possible  evalua¬ 
tion  orders,  uses  fewest  intermediate  variables. 

Let  x  be  a  node  in  the  parse-tree  T  of  an  expression  E.  Suppose 
n(x)  represents  the  minimus  nimber  of  intermediate  variables  needed  in 
computing  the  subexpression  whose  sub-tree  is  rooted  at  x.  Apply  the 
following  algorithm  to  each  node  of  T,  applying  it  to  every  descendant 
of  a  node  y  before  y. 

1.  If  x  is  a  leaf  of  T,  x's  subexpression  is  a  variable  input 
to  E,  and  needs  no  algorithm  for  its  computation.  Set  n(x)  ■  0. 

2.  If  x  is  not  a  leaf  of  T,  let  its  immediate  descendants  be  x^ 
and  x2»  Then  n(x^)  and  n(x2)  have  already  been  computed. 

a.  If  nCx^  >  n(Xj),  (l,j  ■  1,2),  compute  x^'s  sub¬ 
expression  first.  One  cell  retains  the  result  of  this 
computation  during  the  following  calculation  of  x^'s 
subexpression.  Set  n(x)  ■  n(x^),  since  enough  cells 
remain  of  the  n(x^)  -  1  to  compute  x^,  which  needs 
only  n(Xj)  <  n(Xj)  -  1. 

b.  If  n(xj)  -  n(x)2,  then  regardless  of  which  subexpression 
is  computed  first,  one  cell  is  required  to  hold  its 
result.  Then  an  additional  n(xj)  cells  is  needed  in  com¬ 
puting  the  other  result,  x's  subexpression  can  be 
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computed,  using  A  <-A  op  I,  Into  one  of  the  cells  now 
holding  a  subexpression's  result.  Hence,  n(x)  ■  n(xp  +  1, 

The  operation  of  this  algorithm  depends  on  the  presence  of  elementary 
operations  for  performing  assignment  statements  like  A  «-A  op  B,  where 
one  Input  variable  Is  replaced  by  die  assignment's  result.  Otherwise, 
this  same  algorithm  can  be  extended  directly  to  matrix  arithmetic  expres¬ 
sions.  We  show  later  that  operations  very  similar  to 

A  <- A  *  B,  and  of  course  A  «-A  +  B 

are  available  for  matrices  A  and  B. 

Caller  and  Perils  [4]  were  early  advocates  of  the  addition  of 
matrix  arithmetic  capability  to  compiler  languages.  They  coamnt  on  die 
potential  danger  of  allowing  a  compiler  to  allocate  large  amounts  of 
storage  for  the  evaluation  of  matrix  expressions.  They  suggest  a  number 
of  elegant  devices  for  performing  various  matrix  manipulations.  One  of 
particular  elegance  seems  to  be  their  proposal  for  Implementation  of 
the  matrix  transpose  operation.  No  Instructions  are  needed  at  execution 
time.  Instead,  In  each  algorithm  compiled,  the  compiler  Interchanges 
the  Indices  in  the  subscript  positions  of  each  subscripted  variable 
which  Is  a  transposed  matrix. 

Galler  and  Perils  also  present  sn  interesting  technique  for  com¬ 
puting  a  succession  of  matrix  products.  Indeed,  this  technique  demon¬ 
strated  the  confutation  of  subsets  of  the  elements  of  a  matrix  result, 
followed  by  Immediate  use  of  those  elements.  We  generalise  this  notion 
to  that  of  "loop  fusion",  applicable  to  algorithms  other  than  matrix 
arithmetic  algorithms,  in  the  sequel. 

Galler  and  Perils'  algorithm  for  matrix  multiplication: 

Suppose  we  wish  to  confute  A  *  B  *...*  K,  a  product  of 

matrices. 

Let  x*1  represent  the  1th  row  of  matrix  X. 

Then  we  can  compute 

A1  *  B  -  (A  *  B)1 


without  using  more  than  one  vector  of  storage.  By  extension, 
only  vectors  of  storage  are  needed  In  computing 

(A  *  B  *...*  K)1 
by  (A  *  B  *...*  J)1  *  K 

By  repeating  this  computation  of  one  row- vector  of  the 
product  for  different  rows  A*,  the  entire  product  can 
be  produced,  using  only  one  matrix  to  hold  the  result. 

[Galler  and  Perils  also  show  that,  in  the  repeated  products 
of  row-vectors  with  matrices  needed  to  produce  a  result 
row,  only  two  vectors  of  storage  are  needed,  at  most.] 

While  the  Galler-Perlis  algorithm  produces  highly  acceptable  pro¬ 
grams  for  computing  certain  expressions,  It  Is  Inapplicable  to  others. 
For  example.  It  does  not  apply  to:  A  *  (B  +  C)  *  D.  Were  we  to 

compute  A1  *  (B  +  C),  we  could  use  an  entire  matrix  to  hold  the  value 

of  (B  +  C).  Otherwise,  the  value  of  (B  +  C)  would  necessarily  have  to 
be  re-computed  as  each  of  the  N  rows  of  A  were  multiplied  by  (B  +  C). 

The  re-computatlon  produces  a  progrsm  which  Is  not  minimra- connection- 
time,  and  Is  hence  unacceptable. 

I. A  Statement  of  the  Problem 

We  consider  matrix  arithmetic  expressions  (MAE's)  (as  distinct  from 
MEA's)  whose  syntax  Is: 

<MAE>  : :  -  <MAT>  |  <MAT>  +  <MAE> 

<MAT>  : :  -  <MAP>  I  <31Ad  *  <3iAT> 

<MAP>  ::  ■  (<MKE>)  |  <3natrlx  identified 

The  operators  +  and  *  signify  matrix  addition  and  multiplication  respec¬ 
tively.  A  Onatrix  Identified  is  declared  as  an  <array>  [6]. 

We  restrict  the  MAE's  we  Investigate  as  follows: 

(1)  Each  Onatrlx  Identified  Is  declared  to  have  subscript 
bounds  of  [1:N,  1:N],  Thus,  each  matrix  must  be  square; 
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(2)  The  semantics  of  an  <MAE>  or  which  Includes  more 

than  one  operator  is  interpreted  to  be  right-associative. 

Thus,  we  assise  that  the  MAE  A  *  B  *  C  nust  be  computed  sat 
(A  *  (B  *  C)). 

(3)  No  <MAE>  nay  contain  more  than  one  instance  of  the  same 
(sub)  <JiA£>. 

He  propose  to  find  a  program  which,  among  a  certain  set  of  programs, 
computes  any  given  MAE  using  the  smallest  nunber  of  intermediate  2 -arrays , 
subject  to  a  restriction  on  the  acceptability  of  programs: 

Each  acceptable  program  must  be  a  minimum-connect  ion- time  program. 

The  set  of  pzograms  we  study  consists  of  sequences  of  instances  of 
basic  matrix  assignment  algorithms.  The  basic  matrix  assignment  algor¬ 
ithms  are  a  potentially  infinite  collection  of  algorithms  derived  from 
algorithms  for  the  matrix  assignments  A  «-  B  *  C  and  A  «-  B  +  C.  Because 
the  set  of  basic  matrix  assignment  algorithms  is  far  larger  than  the 
collection  of  algorithms  usually  used  for  compilation,  the  technological 
space  minimum  we  obtain  using  them  is  smaller  than  that  obtainable  using 
only  algorithms  for  A  «-  B  *  C  and  A  «-  B  +  C.  However,  we  can  make  no 
claim  to  have  discovered  either  time  or  space  optimal  algorithms,  for  we 
have  no  proof  that  we  have  exhausted  all  programing  possibilities  in 
constructing  the  particular  set  of  programs  studied. 

1,5  Overview  oi  Our  Approach 

He  first  describe  a  technique,  called  "loop  fusion",  for  creating 
new  programs  equivalent  to  certain  given  programs.  This  technique  pro¬ 
duces  programs  which  are  computationally  equivalent  to  the  given  programs, 
and  idilch  require  the  same  (or  slightly  less)  execution  time.  Their 
pattenis  of  accessing  and  computing  data  are  different,  however. 

Using  loop-fusion,  we  find  we  can  grow  a  potentially  infinite 

i 

collection  of  algorithms  for  evaluating  matrix  expressions.  These 
algorithms,  called  MEA's,  are  grown  from  only  five  basic  algorithms. 

They  each  require  internal  intermediate  variables  proportional  in  number 
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to  N. 

A  compilation  algorithm,  called  the  "leaves-in"  algorithm  is  pre¬ 
sented  next.  This  algorithm  discovers  the  space-minimal  decooposltion 
of  a  given  expression  into  MBA's.  It  does  so  by  "tailoring"  MSA's  to 
fit  each  subexpression  of  the  given  expression,  in  all  possible  ways. 

While  it  succeeds  in  avoiding  redundant  re-decomposition  of  sub-branches, 
it  requires  computational  effort  (time)  which  grows  exponentially  with 
the  number  of  operators  in  the  given  expression. 

A  general  technique,  called  "conparlson" ,  is  proposed  to  reduce 
the  computational  effort  of  exhaustive  search  optimization.  Ibis  tech¬ 
nique  attempts  to  avoid  generating  all  possible  "completions"  of  a  par¬ 
tially-specified  search  state.  It  does  so  by  comparing  two  interchangeable 
partial  states  in  all  possible  completions,  without  actually  generating 
these  completions.  By  generalizing  over  all  completions,  a  predicate 
Independent  of  any  cooqtletlon  la  produced,  which  compares  two  partial 
states.  Certain  evaluation  rules  for  states  permit  derivation  of  an 
equivalent  predicate  which  does  not  mention  completions,  and  My  hence 
be  evaluated  by  examination  of  only  the  partial  states  it  compares. 

Comparison  is  applied  to  partial  states  in  the  leaves-in  algorithm. 
Here,  a  partial  state  corresponds  to  a  possible  algorithm  for  use  in 
computing  a  single  subexpression.  A  complete  state  is  an  algorithm  for 
computing  the  entire  given  expression.  By  eliminating  many  partial 
states  as  soon  as  they  are  generated,  the  effort-requirement  of  the 
leaves-in  algorithm  is  reduced  to  a  linear  function  of  the  number  of 
operators  in  the  given  expression. 


CHAPTER  II 


II. 1  Basic  Definitions 

The  Input  set  of  «n  algorithm  Is  the  set  of  variables  whose  con¬ 
tents  Just  before  the  algorithm  is  executed  are  accessed  during  the 
algorithm's  execution. 

The  change-set  of  an  algorithm  is  the  set  of  variables  stored 
into  during  the  algorithm's  execution. 

The  result-set  of  an  algorithm  is  the  set  of  variables  which  are 
(1)  in  the  change-set  of  the  algorithm,  and  (2),  whose  contents  Imme¬ 
diately  after  the  algorithm's  execution  is  input  to  some  other  algorithm. 

The  intermediate  set  of  an  algorithm  is  the  set  of  variables  in  the 
algorithm's  change-set,  and  not  in  its  result-set. 

The  inputs  to  an  algorithm  are  the  values  the  variables  in  the  al¬ 
gorithm's  input-set  hold  just  before  the  algorithm's  execution.  Similarly, 
the  result  of  an  algorithm  is  the  set  of  values  its  result-set  holds  just 
after  the  algorithm's  execution. 

The  word  "algorithm"  here  may  be  taken  to  mean  "statement-sequence", 
Including  the  sequence  consisting  of  exactly  one  statement.  He  will 
therefore  use,  for  example,  "result-set  of  a  statement"  in  its  sense 
as  defined  here. 

The  word  array  is  our  name  for  the  Algol  [6]  Subscripted  varlable>. 

A  k-subscript  array  corresponds  to  an  Algol  subscripted  variable  having 
k  subscript  positions. 

For  our  purposes,  a  k-subscript  array  is  a  named  set  of  variables. 

If  two  arrays  A  and  B  have  different  names,  so 

name(A)  /  name(B) 

then  A  fl  B  ■  i.e.,  A  and  B  have  no  varir'le  in  comnon.  Furthermore, 

if  [1]  and  [j]  are  two  k-tuples  such  that  [1]  /  [j],  then  A[i]  is  some 
particular  variable,  a  member  of  A,  and  A[l]  /  A[j],  Thus,  different 
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combinations  of  the  k  subscripts  select  different  variables,  all  members 
of  the  array, 

A  simple  algorithm  is  a  sequence  of  loops  and  assignment  statements 
containing  no  branches  outside  loops. 

Two  simple  algorithms  are  adjacent  just  when  every  statement  of  one 
precedes  every  statement  of  the  other,  and  when  no  statements  intervene 
between  the  last  statement  of  the  first  algorithm,  and  the  first  state¬ 
ment  of  the  second  algorithm. 

If  A  is  a  set,  the  number  of  elements  of  A  will  be  denoted  site  (A). 
11,2  Parse-Trees  and  Expressl  <ns 

We  often  find  it  convenient  to  refer  to  expressions  by  their  parse- 
trees.  The  parse-tree  of  an  expression  is  a  directed  graph  with  labeled 
edges.  The  nodes  of  this  graph  correspond  to  the  operators  and  variables 
in  he  expression  in  such  a  way  that,  if  the  expression  is  <3MXORx3J2>, 
the  parse-tree  contains  a  node  1  whose  name  is  the  same  as  <OP>,  and 
whose  left  son  is  a  subtree  corresponding  to  <E1>,  and  tritose  right  son 
is  a  subtree  corresponding  to  <E2>. 

Left  (right)  sons  are  located  by  following  the  branch  labeled  left 
(right)  to  the  node  it  is  incident  on.  We  often  use  "family  tree"  ter¬ 
minology  when  dealing  with  trees.  Other  terminology  used  is: 

"leaf"  -  A  leaf  of  a  tree  has  no  descendants. 

"root"  -  The  root  of  a  tree  is  a  unique  node 
having  no  ancestor. 

"result"  -  The  result  of  a  node  X  of  a  parse- 
tree  is  the  value  of  the 
subexpression  represented  by  the 
subtree  rooted  at  X. 

Each  of  our  parse-trees  has  a  root,  the  node  corresponding  to  its 
expression's  main  connective.  The  expression  this  operator  is  a  part 
of  is  a  subexpression  of  no  larger  expression.  Hence,  the  node  which 
corresponds  to  it  has  no  ancestor.  Any  given  algorithm  which  computes 
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an  expression  has  an  associated  parse-tree,  that  of  the  expression  com¬ 
puted  by  the  algorithm.  In  addition,  we  attach  to  its  p<u*se-tree  char¬ 
acteristics  of  the  algorithm  which  will  enable  us  to  tell  when  and  how 
algorithm  can  combine.  In  particular,  we  associate  "access-characteris¬ 
tics"  with  each  leaf,  and  the  algorithm's  "result-characteristic"  with 
the  root. 

II. 3  Parse-Tree  Exaaiples: 

(1)  Expression:  (A+B*C*D)  *  (E+F*(Wfl)) 

Note  that  we  have  assumed  a  right-associative  convention  where 
ambiguity  arises,  as  In  B*C*D.  B*C*D  is  taken  to  mean  B*(C*D). 

(2)  Parse- tree  for  the  above  expression,  fully  labeled.  Here  L 
stands  for  left,  R  for  right. 

+  + 


?  vv 

B  *  F  + 

vv  w 

C  D  G  H 
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(3)  In  parse-trees,  we  will  omit  arrow-heads  on  lines  pointing  down¬ 
ward,  so  that  all  Indicated  lines  will  be  assumed  to  have  an  arrowhead 
on  the  end  nearest  the  bottom  of  the  page.  Also,  Instead  of  explicit 
L  or  R  labels,  we  will  omit  them  In  favor  of  a  geometrical  convention. 
That  line  drawn  left-most  on  the  page  of  any  line  directed  out  from  a 
node  will  be  Implicitly  labeled  L.  Similarly,  the  right-most  line  di¬ 
rected  out  from  a  node  will  be  Implicitly  labeled  R.  These  conventions 
allow  us  to  draw  the  above  parse-tree  as: 


Kach  line  of  this  parse-tree  Is  still  labeled  Left  or  Right,  and  directed, 
but  the  labeling  Is  now  Implicit  In  the  geometry  of  the  drawing  of  the 
parse-tree. 

3 

1  .4  The  ”n  "  Elementary  Algorithms 

We  present  here  a  set  of  algorithms  for  matrix  addition  and  multi¬ 
plication  based  directly  on  the  definition  of  these  operations.  Those 

3 

for  matrix  multiplication  require  n  additions  and  multiplications,  hence 

the  name.  We  realise  that  a  tradeoff  between  additions  and  multiplications 

has  been  achieved  by  Wlnograd  which  reduces  the  number  of  scalar  multlpll- 

3  2 

cations  required  to  n  /2  +  n  .  However,  our  techniques  are  not  greatly 
affected  by  the  new  algorithms,  as  will  be  seen  later,  and  we  prefer  the 
simpler,  more  familiar  algorithms  for  most  examples. 

The  algorithms  are  presented  here  in  an  abbreviated  Algol  notation. 
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The  abbreviations  we  use  ares 


Abbreviation 

Algol 

I  -»N 

for  Isi 

I  *  N 
.-a 

for  1  Si 

I  ->  N 

for  Is. 

x  -N-  e 

x:«x+e 

b 

begin 

e 

end; 

* • 

x  4-  e 

x:«e 

eanlng 

1  step  1  until  N  do 
a  step  a  until  N  do 
N  step  -a  until  a  do 

(See  Footnote  1.) 
(See  Footnote  1.) 


All  algorithms  compute  C:*A  op  B  where  'op1  is  either  '+'  or  The 

subscript  bounds  are  assumed  1:N  in  each  subscript  position.  Additional 
vectors  and  elements  used  for  temporary  storage  are  Introduced  as  needed, 
and  are  assumed  to  be  correctly  declared.  Each  algorithm  is  accompanied 
by  a  tree-like  drawing,  its  EEPT,  which  abstracts  certain  characteristics 
of  the  algorithm.  These  characteristics  are  sufficient  to  determine  when 
elementary  algorithms  can  be  combined  into  an  "alg-tree".  Their  meaning, 
names  and  representations  are: 


1.  parse-tree.  Represented  as  a  tree  whose  nodes  are  operators, 

or  line-ends  (representing  variables  each  of 
whose  names  are  suppressed.) 

2.  space-characteristic.  Represented  as  a  lower  case  letter  sub¬ 

script.  The  characteristic  partially  describes 
the  order  in  which  elements  of  the  input  matrices 
are  accessed  and  in  which  elements  of  the  result 
are  computed.  The  letters  used  are  chosen  as 
follows: 

r  -  "row".  A  row  at  a  time  is  computed  or  accessed. 
The  next  row  may  be  chosen  arb  trarily. 


c-  "column".  A  column  at  a  time  is  computed 
or  accessed.  The  next  column  may  be  chosen 
arbitrarily. 

Cl-  "matrix".  A  matrix  is  computed  before 

any  part  is  complete,  or  is  accessed  in  computing 

one  part  of  the  result. 

^ Since  the  control  structure  of  the  algorithms  is  simple,  we  will 
usually  delete  b  and  e  and  use  indentation  to  indicate  the  scope  associated 
with  matching  begin-end  'a. 
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Algorithm  1: 

I  -» N 
J  -*N 

C[I,J]«-0;  / 

K  -4  N 

C[I, J]  +  «-A[I,K]  *  B[K,J]; 


Algorithm  2; 


J  -»  N 

1  -4  N 

C[I, J]  4-0; 

K  -*  N 

C[I, J]  +  «-A[I,K]  *  B[K,J]; 


Algorithm  3 i 


Z  -»  N 
J  -4  N 

C[I.J]  4-0; 

K  -»  N 


I  -4  H 

J  -4  N 

C[I»  J]  +  4-  A[I,K]  *  B[K,  J]; 


Algorithm  4: 

I  -4  N 

J  -» N 

C[I,J]  4-A[I,J]  +  B[I, J]; 

Algorithm  5: 

J  -» N 


I  -4  N 

C[I,J]  4- A[I,J]  +  B[I, J]; 
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11,5  Loop  Fusing 

In  the  current  section,  we  Intend  to  describe  a  technique  whereby 
two  loops  which  are  sequential  statements  of  a  program  can  sometimes 
fuse.  The  result  of  the  fusion  is  a  single  loop  which  is  computationally 
equivalent  to  the  original  sequence  of  two  loo;s.  This  fusion  requires 
less  intermediate  storage  and  no  more  operations  than  the  original 
sequence.  Thus,  we  can  sometimes  replace  a  sequence  of  loops  with  a 
single  loop  which  requires  less  intermediate  storage,  and  no  more  execu¬ 
tion  time,  without  changing  the  result  of  the  calculation. 


We  will  define  a  loop  to  be  any  program  of  equivalent  meaning  to  the 
following  flow-chart: 


Box  is  termed  the  initialisation  of  the 
loop;  and  is  the  predicate  of  the  loop. 


is  the  bod; 


of  the 


If  control  enters  any  node  in  a  flow-chart,  it  does  so  through  a  line 
directed  toward  the  node.  Further,  control  leaves  a  flow-chart  node  (if 
it  leaves  at  all)  through  a  line  leading  away  from  that  node.  We  will 
assume  that  any  flow-chart  having  exactly  one  entrance  and  one  exit  can 
be  substituted  for  nodes  drawn  as  square  boxes:  □  .  Such  flow-charts 
may  include  assignment  statements,  as  well  as  branches,  which  are  drawn 
as  circles  with  more  than  one  exit:  A  branch  may  test  any  pred- 

cates  on  program  variables  to  decide  iriilch  of  its  exits  control  is  to 
leave  through;  it  may  not,  however,  Include  assignment  statements,  that 
is  it  may  not  specify  that  a  variable  of  the  program  be  stored  into. 


Let  U  be  the  collection  of  relevant  program  variables.  We  will 
Judge  the  effect  of  a  flow-chart  by  its  effect  on  these  variables.  That 
is,  two  flow-charts  will  be  termed  equivalent  if  and  only  if  die  contents 
of  all  variables  in  U  after  their  executions  are  the  same,  if  the  contents 
of  all  variables  in  U  before  their  entrances  agreed. 

We  present  a  series  of  equivalent  flow-charts.  Here  P*  is  a 
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Boolean  variable  not  in  U,  ao  that,  in  particular.  It  Is  referenced 
nowhere  else  In  the  flow-chart  of  which  this  loop  Is  a  part. 


Flowchart  3  simplifies  the  predicate  P  so  that  we  nay  assume  that  the 
decision  to  be  made  when  the  branch  on  P'  Is  reached  Is  pre-established, 
at  the  time  the  last  variable  in  P  Is  assigned  a  value. 


We  are  attempting  to  emphasize  the  repetitious  nature  of  a  loop. 
In  fact.  If  we  know  that  P'  would  be  set  true  the  first  K  times  the 
box  was  entered,  then  flow-chart  3  Is  equivalent  to 

(A)  - [a^pJ - [b,pJ - 


B,P* 


K  times 


Thus,  a  loop  can  be  regarded  as  a  compact  abbreviation  for  a  certain 
sequence  of  square  flow-chart  boxes.  The  "test"  has  no  effect  on  the 
relevant  program  variables.  Its  effect  Is  to  ensure  that  one  can  create 
a  flow-chart  which  can  be  executed  a  variable  number  of  times. 

We  will  summarize  the  effect  of  any  square  box  on  the  variables  In 
Uly  a  set-assignment  statement.  Observe  that  the  computation  performed 
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by  any  box  assign*  certain  values  to  soae  of  the  variables  of  U.  The 
variables  so  changed  (stored  Into,  or  assigned  to  are  equivalent  but 
less  coapact  tens)  are  son  subset  of  U  determined  by  the  flow-chart 
substituted  for  the  box,  as  well  as  by  the  contents  held  In  certain  var¬ 
iables  on  entry  to  the  box.  Similarly,  the  values  stored  Into  these 
changed  variables  are  functions  of  the  box,  and  the  Input  values.  A  set 
assignment  statement  describes  this  relation  by  listing  In  a  slrgle  assign¬ 
ment  statement  the  set  of  variables  changed,  the  function  mapping  used  to 
compute  their  values,  and  the  set  of  variables  those  values  are  used  in 
the  computation. 

Set  assignment  statement  example: 

R  «-  f  (A) 

In  the  above  example,  R  Is  the  set  of  variables  changed  by  the  set- 
assignment  statement,  f  Is  the  function  used  to  compute  their  values,  and 
A  Is  the  set  of  variables  Input  to  the  statement.  Note  that  It  say  be 
Impossible  to  determine  the  membership  of  each  of  these  sets  of  variables 
without  executing  the  flow-chart  sunsuirized  by  the  set-assignment  state¬ 
ment  at  the  proper  time  In  the  program.  However,  we  can  discuss  relations 
between  these  sets  of  variables,  leaving  to  another  problem  the  task  of 
deciding  If  these  relations  are  satisfied.  There  will  be  no  loss  of 
clarity  in  the  sequel  in  using  "assignment  statenmnt"  to  stand  for  either 
the  usually  understood  assignment  statement,  or  the  set-assignment  state¬ 
ment. 

As  a  consequence  of  our  Introduction  of  the  set-assignment  statement, 
and  the  flow-chart  equivalences  sketched  above,  we  will  regard  a  loop 
as  a  certain  sequence  of  set- assignment  statements.  We  will  allow  the 
sets  of  variables  and  the  functions  of  these  statements  to  differ  arbi¬ 
trarily  from  statement  to  statement  of  the  sequence.  We  will  write  a 
loop  as  one  or  more  set-assignment  statements,  separated  by  semi-colons, 
and  enclosed  In  square  brackets: 


[R  «-  f(A);  S  *-g(B)] 
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The  sequence  of  statements  this  represents  is: 

00,  4~  f  1  ((A),);  (S)1  4-g1<(B>1);  (R>2  4-  f2«A)2);... 

The  center  of  a  loop  will  refer  to  the  statements  enclosed  in  brackets 
in  the  loop's  abbreviation.  In  the  above  example,  the  loop's  center  is 

R  4-  f(A);  s  «-g(B). 

More  generally,  let  X  be  a  set  of  variables,  and  Sj  be  sane  particular 
occurrence  of  a  set-assignment  statement,  S. 

Then  ve  define  (X)  to  be  the  subset  of  X  input  to  S 

•J  J 

(X)j  to  be  the  subset  of  X  stored  into  by  Sj. 

Suppose  S  is  a  statement  in  loop  L.  To  refer  unambiguously  to  an 
occurrence  of  S,  J  must  Indicate  the  occurrence's  position  in  the  sequence 
of  assignments  resulting  from  L's  iteration.  J  must  also  distinguish 
among  the  possibly  several  statements  of  the  loop's  center.  We  write 
J  «■  (L,l,k),  where  L  is  the  nsme  of  the  loop,  i  gives  the  ntmiber  of  the 
statement  within  the  center  of  L  (which  le  Itself  a  sequence  of  state- 
■ents),  and  k  gives  the  iteration  number  of  the  loop. 

We  will  denote  the  contents  of  a  variable,  or  set  of  variables, 

X,  Just  after  the  Kth  assignment  statement  in  some  sequence  as  v(X,X). 

In  general  K  is  a  triple,  (L,l,j)  giving  L  ■  loop-name,  1  -  statement- 

niaeber  in  loop,  and  J  -  iteration  number  of  this  occurrence,  to  uniquely 

2 

identify  the  assignment-statement  ve  mean. 

If  A  and  B  are  two  sets  of  variables,  we  say  v(K,A)  -  v(J,B)  just 
when  there  is  a  one-one  correspondence  between  A  and  B,  and  when  v(K,a) 

■  v(J,b)  for  each  pair  of  corresponding  variables  a  in  A  and  b  in  B. 

Also,  let  c(K,X)  denote  the  contents  of  X  just  before  the  Kth 
assignment  statement. 


^Triples  will  be  identified  by  capital  letters,  and  the  third  ele¬ 
ment  of  that  triple  will  be  the  lower-case  letter  corresponding  to  the 
triple's  identifier. 


We  will  wriCe  (A)^  =  (B)  to  mean 


v(K,(A)k)  -  v(J,(B)J) 

We  say  that  two  sequential  loops,  LI  and  L2,  can  fuse  when  a  loop 
L3  constructed  from  the  components  of  LI  and  L2  is  computationally 
equivalent  to  the  statement  sequence  L1;L2.  In  our  notation,  the  set- 
assignment  statement  within  the  loop-brackets  corresponds  to  the  re¬ 
peated  box  of  the  corresponding  loop.  In  addition,  various  loop 
initialization  is  summarized  there.  We  symbolize  the  fusion  L3  of 
Lis  [R  «-  f (A)]  and  L2:  [S  +-g(B)]  by 


L3:  [R  4-  f (A) ;  S  *-g(B)]. 


II. 6  Fusion  in  Flow-Charts:  Graphic  Description 

Let  P',  Q'  be  Boolean  variables  not  in  U,  so  that  P*  occurs  only 
in  those  boxes  1  X.P'l  and  which  explicitly  mention  it. 


Under  certain  conditions,  flow-chart  FI  is  equivalent  to  flow-chart  F2, 
Flov-chart  F2  is  itself  one  loop. 
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Suppose  v(j,P')  Is  the  value  P'  takes  on  Just  before  box  ©  Is 
entered  for  the  jth  time  In  flow-chart  FI.  Define  v(J,Q')  similarly. 
If 


v(JfQ’)  -  v(J.P')  for  all  j, 
then  the  fusion  F2  can  be  simplified: 

F3: - >  [~ CT->dO->®— - > 


II. 7  Loop  Fusion  Conditions 

Let  U  be  the  (finite)  set  of  all  relevant  program  variables. 

Let  J  be  a  triple  (L,l,k)  where  L  is  a  loop-name. 

1  is  the  statement-number  in  L's  center,  and 
k  is  the  iteration  number  of  L. 

Let  v(J,X)  denote  the  contents  of  X  just  after  set-assignment  J. 

Let  c(J,X)  denote  the  contents  of  X  Just  before  set-assignment  J. 

Let  (X)[J],  and  (X)j,  mean  the  subset  of  X  input  to  set-assignment  J. 

let  (X)[ J]t  and  (X)j»  mean  the  subset  of  X  stored  into  by  set-assignment  J. 

Let  (X)j  h  (Y)r  mean  vCJ.OOj)  -  v(K,(Y>k). 

Let  (X) j  -  (Y)r  mean  cCJ.QOj)  -  c(J,(Y)R). 
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If  and  only  if  c(I,x)  »  c(J,x)  for  all  x  «(U)^t 
then,  for  all  f:  I  terminates  if  and  only  if  J  terminates,  and 

<U)i  -  (0)a  and  (U)j  -  (U)j 

and  (U)x  -  (U)j  and  (U)J  a  <U)j 

Theorem  1:  If  (R)j,  (A)^,  (S) ^  and  (B)j  are  a  finite  collection  of 

finite  sets  of  variables  then  statements  a.  and  b. 
are  equivalent: 

a.  for  all  1  and  J, 

(1)  (R^  D  (S)i  -  /I  if  i  <  J  and 

(2)  (A) j  fl  (S)t  -  f>  if  i  <  j  and 

(3)  (Bjj  n  (R)1  -  i  if  i  >  J 

b.  for  all  f  and  g,  if  there  exist  loops  LI  and  L2  whose 
Jth  set  assignments  are: 

S[L1 ,  1 ,  J]  -  (R)j  -^((Ajj) 

S[L2,1,J]  -  <S)j  «-  gj((B) 

then  there  is  a  loop  L3,  constructed  from  LI  and  L2 

in  the  same  way  that  F2  la  constructed  from  FI, 

and  the  sequence  L1;L2  is  computationally  equivalent  to  L3, 
written  'L1;L2  e  L3'« 

An  examination  of  conditions  (1)  -  (3)  is  appropriate  to  illustrate 
the  essential  simplicity  of  the  requirements.  First,  note  that  when 
A  and  B  are  sets  of  variables,  "A  fl  B  ■  /("  means  that  A  and  B  have  no 
variables  in  comnon.  We  Interpret  a  "variable"  to  be  a  unique  memory 
cell  of  some  processor.  Then,  "A  fl  B  ■  fnn  means  that  the  sets  A  and  B 
do  not  share  storage.  If  A  is  stored  into  by  SI,  and  B  is  the  set  of 
inputs  to  S2,  this  means  that  SI  does  not  store  into  any  variable  input 
to  S2,  i.e.,  that  no  result  produced  by  SI  is  accessible  to  S2. 

Condtlon  (1),  then,  can  be  Interpreted  to  mean  that  no  variable 
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stored  Into  by  L2  is  also  stored  into  during  any  later  Iteration  of  LI. 
(2)  requires  that  no  variable  Input  to  LI  on  sane  Iteration  be  computed 
on  an  earlier  Iteration  of  L2.  (3)  states  that  no  variable  computed 

by  LI  on  one  Iteration  can  be.  accessed  by  L2  on  some  earlier  Iteration. 

Control  reaches  a  statement  S  just  when  S  is  the  next  statement  to 
be  executed. 

Control  la  absorbed  by  statement  S  just  when  control  reaches  S, 
and  never  reaches  any  statement  after  reaching  S. 

Define 

?(1,J)  s  "control  reaches  the  test-statement  of  loop  LI  for  the 
jth  time" 

CQ, ]  *[c((Ll ,1, j),P')  -  true] 

[Q2]«  [c«L2,1,j),Q')  -  true] 

[Q3]  =  [c«L3,1,j),P')  -  true] 

V  [c«L3,1,J),Q')  -  true] 

-ff(l.J)  s  P(i,J)  A  [Q^  A  -P<i,j+1) 

(-rT(l,J)  Is  true  If  control  Is  "absorbed"  during  the  Jth  Iteration 
of  Lt.) 

L«t  *[c«L3,1,J),P')  .  true] 

[*2^  =  [c«L3,l,j),Q')  -  true] 

Then  T(i,j)  s  -JP(l,J)  V  -{Q^  V  P(i,j+1) 

Certain  facts  are  easily  seen  from  examination  of  flow-charts  FI 
and  F2.  These  facts  we  call  "Axioms": 

Axiom  1.  P(i,J)  -»P(i,J-1)  A  [Q1]J-1 .  1  -  1,2,3. 

Axiom  2.  [Q1]k  -»Vj  [0  <  J  <  k  ->  [Q^]  1  -  1,1,3. 

Axiom  3.  -» Vk  [k  >  J  A  P(3,k)  -♦-{R^] 
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Umna  1:  P<i,J)  e  T(i,J-1)  A  P(i,J-1)  A 

Proof:  By  definition  T(1,J-1)  e  nP(i,j-1)  v  — 1  V  P(i,J) 
from  propositional  calculus, 

T(i» J-l )  A  P(i,J-1)  s  C-fQt3J-1  V  P(i,J)]  A  P(i,J-1), 

and  T(i, J-1 )  A  P(i.J-l)  A  [Q1DJ.1  *  P(i»J)  A  P(i,J-1)  A  [Q^ 

-V(i,J). 

From  the  definition  P(i,J)  -*T<i,J-1). 

By  Axiom  1,  P<i,j)  ->P(i,J-1)  A  [Qt] 

Lemma  2:  P(i,0)  -» 

P(i,J)  2  Vk  [0  <  k  <  J  -* 

T(i,k)  A  [Ot]k] 

Proof:  By  induction  on  j. 

Assume  P(i,0). 

Then  P(i,1)  e  T(i,0)  A  [Q^  A  P(i,0) 

P(i,0)  -»[P(i,1)  s  T(i,0)  A  [Q13()3 

P(i,1)  e  Vk  [0  <  k  <  1  -»  T(i,k)  A  [Qt]k]. 

By  Lemma  1,  P(i,J)  s  T(i,J-l)  A  P(i,J-l)  A  [Q±]  , ^ 

from  the  induction  hypothesis  P(i,J-l)  s  Vk  [0  <  k  <  J-1  -»T(i,k)  A 
Therefore  P(i,J)  s  T(i,J-l)  A  [Qi]J-1  A  Vk  [0  <  k  <  J-1  ->T(i,k)  A  CQ±3k3 

or  P(i,J)  =  Vk  [0  <  k  <  J  -» T(i,k)  A  [Q^] 
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Lemma  3 


A  Vk  [0  <  k  <  J  -»T(i,k)]  -*P(i,J) 

We  know  that 

Vk  [0  <  k  <  J  -» CCQ±3k  A  T(i,k)]]  -»P(i,J),  from  Lemma  2. 
Suppose  [Qj^j  Then  Vk  [0  <  k  <  J  -»  by  Axiom  2. 

If  Vk  [0  <  k  <  j  ->T(i,k)j,  as  well,  then 
Vk  [0<k<  J  -[[Q^A  T(i,k)  ]] , 
which  by  Lemma  2  yields  P(i,j). 

Therefore,  |  A  Vk  [0  <  k  <  J  -*T(i,k)]  -*P(i,j). 
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Theorem  2:  If  a.  of  Theorem  1  holds,  and 
if  P(1,0),  1  -  1,2,3,  and 
c«L1,1,1),x)  -  c((L3,1,1),x)  WxeU 
then,  for  all  J, 

Vk  <  J  1(3, k)  s  Vk  <  J  (T(1,k)  A  T(3,k)] 
and 

P(3,j)  A  T(1 , J)  A  T(2, J)  -4 

(X)j  .  (X'^  A  (Xj)  .  (X ' ) j ,  X  -  A,B 

and  (Y) j  -  (Y'^  A  (Y*) j  A  (Y)j  m  (Y*)^  Y  .  R,S. 

Proof:  Suppose  P(3,J) 

Then  vk  fO  <  k  <  J  -»T(3,k)  A  CQ3]k3« 

Therefore,  by  the  induction  hypothesis, 

Vk  [0  <  k  <  j  -»T(1,k)  A  T(2,k)  A  [Q3]k] 

Also,  froai  the  induction  hypothesis,  and  the  fact  that 
P(3,J)  -*P(3,J-1),  we  have: 

Vk  [0  <  k  <  J  -  [[Q3]k  «  [Q1]k  V  CQ2]fc]]. 

Vk  [0  <  k  <  J  -»  T(1  ,k)  A  T(2,k)  A  [[Q1]k  V  CQ23k33- 
From  Lemma  3,  and  the  statement  above,  ve  can  deduce  that 
CQilj.,  -»P(i,J)  1-1,2 
Because  P(3,J)  -4[Q3]J_1, 
ve  have  [Qj]^  V  CQ23 » 
giving  P(1,j)  or  P(2,J). 

Thus,  If  L3's  test  Is  reached  for  the  Jth  tine,  so  is  either  Li's  or  L2's. 
[Q3]j  *  CQ-j  3 j  V  CQ23j»  because 

3k  <  j:  c((L1 ,1 , J)»P')  -  v((Ll,1,k),P')  -  v((L3,1,k1),P') 

-  c«L3,1  ,J),P' 


By  Lesma  3,  CQ^lj  — ►  CQ^D j_-j •  Axiom  2,  and 

cVj-i  1  - 1*2 

If -i[Qj]j»  then  the  theorem  holds,  for  S[Ll,k,j]  la  a 

vacuous  statement,  which  accesses  and  changes  no  variables, 
and  always  terminates. 

If  CQ3  3 j  then 

P(1,J)  A  or  P(2,J)  A  CQjlj- 

If  CQ-j 3 j »  then  S[L3,1,J]  is  non-vacuous,  and  identical 

in  its  set-assignment  flow-chart  to 

9 S[L1 ,1 ,J]. 

For  any  x  c  (A)j, 

because  (A)j  fl  (S)A  ■  /I  Vi  <  j, 

and  (S'J^  -  (S)^  Vi  <  J  by  induction  hypothesis 

(A)j  fl  (S')i  -  ^  Vi  <  J 

Therefore,  c((Ll ,1 , J),x)  /  c((L3,1 , j),x) 
only  if  x  c  (R1^  for  some  i  <  j. 

But  (R1)^  «  (R)^  and  (R1)^  (R)^  by  induction  hypothesis, 

so  x  would  be  assigned  the  sane  value  by  both  S[L1,1,J]  and  S[L3,1,j]. 
Therefore,  by  the  properties  of  identical  set-assignments, 

S[L3,1,J]  terminates  ■  S[Ll,1,j]  terminates, 

and  if  we  assume  T(1,j),  so  that  S[L1,1,j],  then 
S[L3,1,j]  terminates,  and 
(A'Jj  -  (A)J  A  (A'Jj  -  <A)j  and 

(R'Jj  -  (R)j  A  (R'^  m  <R)j  [line  (a)] 

If  — »T (1  ,J)  were  assumed,  so  that  CQ^]j  would  be  true, 

this  reasoning  shows  that  -ff(3,J),  for  control  would 
be  absorbed  in  the  flow-chart  of  S[L3,l,j],  just  as 
it  was  in  S[Ll,1,j]. 


I£  — CQi 3 j »  both  S[L1,1,J]  and  S[L3,1,J]  are  vacuous, 
and  the  theorem  holds. 

Regardless  of  whether  [Q,]^  or  not,  if  P (3 , J )  A  T(1,J), 

control  reaches  the  test  preceding  S[L3,2,j]. 

Again,  If  — £Q23j»  both  S[L2,l,j]  and  S[L3,2,J]  are 
vacuous  and  the  theorem  holds. 

Otherwise,  [Q2]j  A  P(2,J). 

If  b  «  (B)j,  b  may  have  last  been  stored  Into  by: 

(1)  S[Ll,l,i],  1  >  0,  or 

(2)  S[L2,l,k],  0  <  k  <  J,  or 

(3)  None  of  the  above. 

Case  1:  S[L1 , 1 ,1],  1  >  0. 

Then  be  (R)^  b  i  (R)k  for  k  >  1, 

and  b  i  (S)  for  m  <  J. 
m 

c((L2f 1 , j),b)  .  v((Ll ,1 ,l),b). 

Then  b  /  (R)^  for  k  >  1,  and  b  /  (S)ffl  for  m  <  j. 

Hence  b  /  <R'),  for  J  >  k  >  1,  and  b  /  (S')  for  m  <  J. 
k  —  m 

Since  b  c  (R)^  Abe  (B)j,  1  must  be  <  J, 
for  (R)t  0  (Bjj  -  If  1  >  J. 

(R')^  e  (R)^,  for  1  <  J,  by  Induction  hypothesis, 
and  line  (a)  above. 

Therefore,  since  1  <  j,  and  b  i  (R')k  for  J  >  k  v  1 

and  b  4  (S')  for  m  <  J, 
m 

In  particular  b  /  (S')ffl  for  J  >  m  >  1. 

Hence,  c((L3,2, j),b)  -  v((L3,1,l),b) 

-  v«L1 , 1 ,1) ,b)  .  c((L2,l , j),b). 
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Case  2:  b  last  stored  Into  by  S[L2,1,k],  0  <  k  <  J: 

Then  b  e  (S)^,  0  <  k  <  J,  and 

c((L2,1,j),b)  .  v((L2,1,k),b), 

and  b  e  (S)k  and  b  {  (S)^  for  j  >  m  >  k. 

By  induction  hypothesis,  then, 

be  (S'),  snd  b  ^  (S')  for  J  >  m  >  k. 
k  n 

c((L3,2,j),b)  .  v((L3,2,k),L)  if 
00  k  <  J, 

and(b)  ^m,  k  <  m  <  J  such  that  b  c  (S')m, 
and(c)  ^i,  k  <  i  <  J  such  that  be  (R')^ 

(a)  and  (b)  have  been  shown. 

Since  <S)k  fl  (R)1  -  /J  if  i  >  k 

j(i,  J  >  i  >  k  such  that  b  c  (R)^  ■  (R'J^,  hence  (c). 
Since  k  <  J,  by  induction  hypothesis 

(S)k"  (S,)k  *° 

c((L3,2,j),b)  -  v((L3,2,k),b)  .  v((L2,l ,k),b) 

-  c((L2,l,J),b) 

Case  3:  Neither  1  nor  2  hold. 

Then  b  /  (R) ^ ,  0  <  1,  and 
b  l  (S)k,  0  <  k  <  J. 

Then,  since  c((Ll,1,l),b)  ■  c((L3,l,1),b), 
and  c((L2,l, j),b)  .  c((Ll ,1 ,l),b), 
and  c((L3,2,j),b)  -  c((L3,l ,l),b), 
then  c((L2,1, j),b)  .  c((L3,2,J),b). 
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Therefore,  Vb  e  (B)  , 

c«L2,1,J),b)  -  c«L3,2,J),b)  and 
hence:  S[L2,1,J]  terminates  If  and  only  if 

S[L3,2,J]  terminates. 

If  we  assume  -7(2, j),  then  S[L2,1,j]  and 

hence  S[L3,2,j]  do  not  terminate,  so 

If  we  assume  T(2,j),  then 

(B) j  -  (B'Jj  A  <8)^  -  (B'Jj 
A  (Sjj  -  (S'Jj  A  (S) j  s  (*')) 
by  the  properties  of  set-assignments. 

We  have  shown  that 

P(3,J)  -»  [T(l , J)  A  T(2, J)  -* 

(X)j  -  (X'^  A  (X^  -  (X*)^  X  •  A,B 
(T)j  -  (Y'Jj  A  (Y)j  s  (Y*)^  Y  .  R,S 
A  T(3,J)] 

and  that 

P(3,j)  ViT(2,J)  -»-ff(3,J)]. 

Therefore  P(3,j)  -* [T(3,J)  a  T(1,J)  A  T(2,J)]. 

By  the  induction  hypothesis,  assuming  P(3,j), 

V(k  <  J)T(3,k)  b  V(k  <  J)[T(1,k)  A  T(2,k)], 

V(k  <  J)T(3,k)  A  T(3,J)  e  ¥(k  <  J)[T(1,k)  A  T(2,k)]  A 

CTO.J)  A  T(2,J)], 

or  P(3, j)  -»¥(k  <  J)T(3,k)  s  ¥(k  <  j)[T(1,k)  A  T(2,k)]. 

If  H?(3,J),  then  3k  0  <  k  <  j  A  [-tf(3,k)  V 

by  the  induction  assvasption,  this  is  equivalent  to: 
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ak  0  <  k  <  J  A  [-i[T(1  ,k)  A  T(2,k)]  V  [-.[Q,^  A 
If  Vk[0  <  k  <  j  -*T(1,k)  A  T(2,k)],  then,  since 
HP(3,J)  -»T(3,J)  by  its  definition, 

Vk  0  <  k  <  j  T(3,k)  by  the  Induction  hypothesis  and  T(3,J). 
Also,«k[0  <  k  <  J  -» T(l,k)  A  T(2,k)]. 

Therefore,  3k  0  <  k  <  J  :  nCQ^  A  -iCQ2]k» 

.\HP(1,J)  A  -*<2,j) 

From  the  definition,  then 
T'1,J)  A  T(2, J), 

so  V(k  <  J)T(3,k)  m  V(k  <  j)[T(1,k)  A  T(2,k)] 

If  3k  0  <  k  <  J  A  -i[T(1  ,k)A  T(2,k)], 
by  the  Induction  hypothesis 
3k  0  <  k  <  j  A  — iT(3,k) 
and  hence 

V(k  <  J)  T(3,k)  and  ¥(k  <  J)  [T(l,k)  A  T(2,k)] 
are  both  false. 

Therefore,  we  can  assert 
¥<k  <  j)  T(3,k) 


*  V(k  <  J)  [1(1, k)  A  T(2,k)] 
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Proof  of  Theorem  1: 

Part  1 :  a  Implies  b. 

We  must  show  that,  given  identical  initial  conditions, 

(1)  L1;L2  terminates  if  and  only  if  L3  terminates 

(2)  If  L3  (or  L1;L2)  terminates,  then  the  results 
computed  are  Identical. 

We  have  shown,  in  Theorem  2,  that  for  all  j, 

V(k  <  j)T(3,k)  s  V(k  <  j)[T(1,k)  A  T(2,k)] 

or  Vj  T (3 , J )  s  Vj  [T(1 , J)  A  T(2,J)]. 

L1;L2  terminate  if  and  only  if 

3J1,J2  :  a  -£Q2]j2  A  V(k  <  J 1  )T(1 , J)  A  V(k  < 

Suppose  L1;L2  terminates. 

Then  ajl,J2  :  iCQ,]^  A  -CQ2]J2  A  V(k  <  J1)T(1,k)  A 
Let  J  -  max(j1,J2). 

Clearly,  V(k  >  jl)  -iP(1,k),  and  hence  T(1,k). 

Similarly,  V(k  >  J2)  HP(2,k),  and  hence  T(2,k). 

Therefore,  V(k  <  j)[T(1,k)  A  T(2,k)] 

Suppose  HP(3,J).  Then  a(k  <  j)  :  -i[Q3]k, 

for  V(k  <  J)T(3,k). 

Therefore  L3  terminates 

Suppose  P(3,j).  Then  Jl  <  J  A  J2  <  J, 

so  P(3,J1)  A  P(3,J2). 

Hance  ”’CR1]j1  A  ”*CR23 j2- 

Hence  — £R1 3 j  A  — i[R-23  j . 

Thus,  so  L3  terminates. 


J2)T(2, j) 

W(k  <  J2)T(2,k). 
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Suppose  L3  terminates. 

Then  3  j  :  -ifQjlj  A  W(k  <  j)T(3,k)  A  P<3,  j) 

Hence  3J  :  -iCR^  A  A  v<k  <  J)[TO.k)  A  T(2,k)]  A  P(3,j). 

.'.3J  J  A  A  V(k  <  j)[T<1,k)  A  T(2,k)] 

Thus,  LI ;L2  terminates. 

Hence  L3  terminates  if  and  only  if  L1;L2  terminates. 

If  L3,  say,  terminates, 

3J  :  V(k  <  j)[-n[Q3]J  A  T('J,k)] 
hence  V(k  <  j)P(3,k). 

Then  by  theorem  2,  Vk  <  J, 

(*)k  B(R')k 
end  (S)k  ■  (S*)k 

so  die  results  at  the  time  they  are  computed  are  identical. 

It  is  conceivable  however  that  a  result  computed  as  part  of 
(S)1  would  be  destroyed,  in  L3,  by  statement  S[L3,1,J]  for 

SOM  J. 

This  could  only  happen  if  J  >  1  and 

(s')t  n  /  t 

a  possibility  denied  by  (a)  of  Theorem  1,  which  we  are 
assuming,  and  Theorem  2. 

If  a  result  of  (R’)^  were  destroyed  by  some  later 
iteration  of  L3,  then 

(a)  If  by  (R’)j,  J  >  i,  the  same  result  would  have  been 
stored  by  LI. 

(b)  If  by  (S')j,  J  >  1,  the  same  result  would  have  been 
stored  by  L2. 

He  conclude  that  L1;L2  s  L3. 


43 


Proof  of  Theorem  1.  Part  Zi  b  Implies  a. 

We  will  show  that  —a  Implies  — *>. 

Consider  any  finite  collection  of  variable-sets 

(A^.CB^ODj  and  (S)^ 

By  using  additional  variables  as  loop  control  elements,  we 
can  create  loops  LI  and  L2  whose  jth  set-assignments  are 

(R)j  «-f«A)  ) 

and  (S)j  «- gj((B)j)  respectively, 

where  the  control  variables  are  designated  not  part  of  the 
set  of  "relevant"  variables,  U. 

Now,  fj  and  g^  can  be  extremely  sophisticated  transfor¬ 
mations,  capable  of  testing  their  Inputs  for  consistency.  In 
fact,  we  can  assume  that  If  f^,  or  gy  ever  accesses  a  variable 
t^»ose  contents  differs  from  that  supplied  f ^  or  g^  on  the  Jth 
Iteration  of  LI  or  L2,  the  function  produces  an  "orror  reaction". 
Such  an  error  reaction  might  take  the  form  of  a  propagation 
of  the  error,  by  changing  the  contents  of  at  least  one  variable 
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In  every  set  (A)^,(B)j,(R)j  and  (S)^.  (Once  some  variables' 

value  differs,  f ^  Is  not  constrained  to  store  only  Into  variables 
of  (R)j«) 


Thus,  If 

(2)  for  any  x  In  (A)j  or  (B)j, 

c((Ll,1,j),x)  /  c«L3,l,J),x), 


the  result  computed  by  L3  will  differ  from  that  computed  by 
L1;L2.  We  will  show  that.  If  al  or  a2  or  a3  do  not  hold, 
then  some  loops  LI  and  L2  exist  fjr  which  L1;L2  does  not  com¬ 
pute  the  same  result  as  L3. 

Case:  Suppose  (R)j  D  (S)^  /  0  for  some  1  <  J. 

Then  some  x  exists, 

x  c  (R)j  A  x  c(S)i  for  soam  1  <  J. 

After  the  sequence  L1;L2,  x's  final  value  was  computed 
by  S[L2, 1 ,1]. 

After  the  loop  L3,  x's  final  value  was  computed  by 
S[L3,1,jj.  fj  and  can  certainly  be  chosen  tAlch 
ensure  that  these  values  differ. 

Case:  Suppose  (A) j  fl  (S)^  /  J i  for  some  1  <  j. 

Then  some  value  input  to  S[Ll,1,j]  is  computed  by 
S[L3,2,1]  on  some  Iteration  1  before  j.  Thus,  for 
some  x  c  (A)  c((L1 ,1,J),x)  /  c((L3,l ,J),x),  and 

hence  L1;L2  f  L3. 

Case:  Suppose  (B)^  fl  CR)±  ^  for  some  i  <  J. 

Then  there  is  x  c  (R)^  A  x  e  (B)j  for  1  >  j. 

In  the  L1;L2  sequence, 

c((L2,1,j),x)  -  v((Ll,l,0,x) 

In  L3,  c((L3,2,J),x)  /  v((L3,1,i),x), 
for  iteration  1  of  L3  follows  iteration  j. 

Therefore,  L1;L2  4  1*3. 
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II .8  Storage  Savings  In  Loop  Fusion 

Once  we  have  established  the  conditions  under  which  two  loops  can 
fuse,  without  changing  their  effect,  there  remains  the  question  of  the 
storage -saving  that  results.  We  will  concentrate  on  the  intermediate 
storage. 

Let  LI,  L2,  and  L3  be  as  before. 

Define  B  -  ^  (B)Jf  R  «  ^(R)^ 

Recall  that  the  size (A),  where  A  is  a  set,  is  defined  to  be  the 

number  of  members  of  A. 

Then  T  ■  B  fl  R  is  the  set  of  variables  used  to  communicate  results 
from  LI  to  L2. 


We  define  T  to  be  the  intermediate  storage  set. 

We  will  suppose  that  R  is  not  input  to  any  statement 
following  L1;L2. 

On  the  kth  iteration  of  L3,  certain  variables  must  be  in  existence 
simultaneously.  Those  which  are  Intermediate,  that  is  members  of  T,  are 
those  trtilch 

(1)  are  computed  in  (R)^  for  i  <k 

(since  for  1  >  k ,  they  have  not  yet  been  computed  and  need 
not  be  present)  and 

(2)  are  accessed  by  some  (B)j  or  (A)j  for  j  >  k 

(since  if  all  such  J's  are  J  <k,  then  the  variable 
has  already  been  used,  and  won't  be  needed  again.) 

This  set  is  T  k  =*  p(1>^k>in)  CCCRJi  0  (B)  j  ]  U  [(R)1  fl  (B)^  fl  (A)^] 

Where  P(i,J,k,m)  is  i  <  k<  J  A  i  <  m  <  kr 

Here,  (R)^  fl  (A) ^  contains  those  variables  computed  in  (R)^  and  accessed 
in  (A)j.  Some  of  them  may  not  be  in  T.  (R)^  fl  (A)^  f|  B  gives  the  set 

in  T.  For  m>  k  (R).  fl  (B)  z>  (R)  fl  (B)  fl  (A)  ,  so  these  are  included 

""  *  *  ®  J 

in  the  first  term  of  T^.  This  leaves  (R)^  fl  (A) ^  fl  (B)^  for  i  <k  A  m  <  k. 
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Since,  in  order  to  fuse,  (R).  D  (B)  ■  jrf  when  1  <  m,  we  get 

l  ■ 

(R).  n  (A),  fl  (B)  for  1  <  a  <  k. 

1  J  m 

If  we  add  two  assumptions  to  those  necessary  for  loop  fusion,  we 
can  reduce  the  Intermediate  storage  requirement  still  further: 

Suppose  (R)^  fl  (B)^  ■  jf  if  1  j 

and  (R)t  fl  (A) j  fl  (B)1  -  jf  If  1  <  J 

Then,  the  Intermediate  storage  set  on  the  kth  Iteration  becomes 
T'k"  <R>k  n  <B>k 

and,  since  successive  Tk's  can  share  the  same  storage, 

only  mgx  (slze((R)k  fl  (B)k)) 

variables  are  needed  for  Intermediate  results. 

There  is  a  second  advantage  to  the  assumption 
(R)t  0  (B) j  -  t  if  1  i  J  , 

This  assumption  allows  the  two  loops  to  be  tested  separately  for  their 
fusion  characteristics. 

Let  (T)t  -  T  0  (B)t 

(T)1  »  T  fl  (R)1 

Theorem:  (R)^  fl  (B)^  «  /J  If  1  /  J  Is  equivalent  to 
[(T)1  -  (T)i  and  (T>1  fl  (T)j  -  ^  If  1  /  J.] 

Proof:  Assume  (R)^  fl  (B) ^  «  f>  if  1  /  j. 

(T)k  -  T  fl  (B)k  -  R  fl  B  fl  (B)k  -  R  fl  (B)k 

(T)k  ■  T  fl  (R)k  ■  B  fl  R  n  (R)k  -  B  fl  (R)k 

But  R  fl  (?0k  -  (R)k  fl  <B)k  -  B  f|  (R)k  as  a 

consequence  of  the  hypothesis. 

Therefore  (T)^  =  (T)^  . 


47 


if  (R)t  n  (b)j  -  i  if  i  /  j, 

since  (T) j  c  (B)j, 

«nd  (T)t  -  (T)£  c  (R)£, 

therefore  (T)j  (1  (T)t  c  (B) ^  n  (R)t  c  /j  if  i  /  j. 

Assume  (T>1  -  (T)t  end  (T^  fl  (T^  -  jl  if  i  +  j. 

Then  (T)  j  -  T  D  (B)^  -  R  n  B  n  (B) ^  -  R  f)  (B)^ 

«nd  (T)1  -  (T)t  -  T  D  (R)1  -  B  D  (R)t 

When  (T^  n  (T)t  -  /( 

t  -  [r  n  (b) j]  n  [b  n  <R>t]  -  <b>  n  (R)t 

Since  (B)j  c  B  and  (R)^  c  R. 

Hence  (B)  n  (R)t  -  t  if  i  /  j. 


I 


J 


* 


J 


4 

i 


* 
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II. 9  Summary  of  Loop  Fusion  Conditions 

We  have  described  a  set  of  conditions  which  guarantee  that  two  loops, 
LI  and  L2  can  fuse  to  yield  one  loop,  computationally  equivalent  to  the 
sequence  Li;  L2.  A  slight  strengthening  of  these  conditions  results  in 
a  fusion  which  uses  less  intermediate  storage. 

Let  LI;  L2  be 

LI:  [R  *-  f (A)] 

L2:  [S  «-g(B)] 

Then  there  is  a  loop 

L3:  [R'  «-  f ( A ' ) ;  S'  <-g(B')] 

constructed  by  fusing  the  flowcharts  of  LI  and  L2  in  such  a  way  that 
statements  of  LI  alternate  with  statements  of  L2. 

If 

(1)  (R)j  n  (S)1  -  i  if  i  <  J 

(2)  (r) j  n  (b)£  -  i  if  i  >  j 

O)  (s)J  n  (a)1  -  i  if  i  >  j 

all  hold,  then 

LI;  L2  is  computationally  equivalent  to  L3. 

If  initial  conditions  agree,  so  that 

c((L1,1f1),v)  ■  c((L3,1,l),v)  for  all  v  in  A  U  B, 

then  finally  for  all  j 

00 j  -  (XOj  ,  X  «  (A,B,R,S) 

00 J  l  00^  .  Ye  {A,B) 

CZ) J  2  (ZOj  ,  z  e  {R.S} 

If  the  storage  conditions 

O')  (R)j  0  (S)t  -  j  if  i  <  j  (same  as  (1)) 

(2')  (R)J  f)  (B)t  =  i  if  i  /  j 

(3')  (S) j  fl  (A)1  -  j  if  i  >  J  (same  as  (3)) 

(V)  (R)t  n  (B)1  n  (A) j  -  i  if  i  <  J 
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all  hold,  then  the  storage  need  for  communication  from  LI  to  L2, 

T  =  U  [(B).  D  (R) .] 

L,j  J 

becomes 

T*  =  max  [(B')k  0  (R')k]  in  L3 

(Here  "max"  of  a  series  of  sets  selects  that  set  \rfiich  contains  most 
members. ) 


II,  10  Algol  and  Flow-Chart  Language 


We  have  Introduced  a  flow-chart  language  to  clarify  the  meaning 
of  certain  programs- parts.  However,  we  consider  any  programming- language 
construct  having  the  same  meaning  as  one  of  these  terms  to  be  the  "same" 
as  that  term.  In  particular,  we  will  often  refer  to  restricted  forms 
of  certain  programming  constructs  in  Algol. 

Algol  of  course  contains  "assignment  statements".  Suitably  restricted 
versions  of  these,  which  store  into  only  one  relevant  program  variable, 
agree  with  our  (unstated  previously)  concept  of  an  "assignment  statement". 

Furthermore,  certain  Algol  FOR-atatements  agree  with  our  definition 
of  "loop".  An  Algol  FOR  statement  whose  FOR -clause  specifies  no  GOTO1 s 
and  whose  body,  S,  specifies  no  GOTO 1 s  which  lead  outside  S  satisfies 
our  definition  of  loop.  (A  GOTO  may  be  specified  implicitly,  as  part 
of  a  procedure  body  which  la  called  in  the  text,  or  explicitly  in  the 
text.  Both  are  excluded.) 

II«11  Algorithm  F uslon  (parallel  connection) 

We  wish  to  describe  how  two  algorithms,  which  occur  in  sequence, 
can  fuse  to  reduce  the  storage  needed  for  communication  from  che  first 
to  the  second.  We  will  base  our  analysis  on  the  properties  which  allow 
loops  to  fuse.  Here,  we  must  isolate  the  loops  in  the  two  algorithms 
which  can  fuse  to  save  storage.  If  these  loops  are  separated  by  one  or 
more  statements,  they  must  be  rearranged,  preserving  computational 
equivalence,  to  make  them  adjacent. 

Let  SI  and  S2  be  adjacent  simple  algorithms.  Let  R  be  a  set  of 
variables  which  is  the  result-set  of  SI,  and  an  input-set  to  S2. 

Under  certain  circumstances,  SI  and  S2  can  combine,  or  parallel  connect, 
allowing  R,  the  storage  used  to  communicate  between  SI  and  S2,  to  be 
reduced  in  size. 

Conditions: 

(1)  The  result  computed  by  SI  into  R  is  input  to  no  statement 
sequence  other  than  S2; 

(2)  There  is  a  loop,  LI,  of  SI  which  encloses  all  statements 
of  SI  which  store  into  R; 


51 


(3)  There  Is  a  loop,  L2,  of  S2  which  encloses  all  statements 
of  S2  which  access  the  values  stored  In  R  by  LI . 

Let  (R)^  be  the  set  of  elements  of  R  which  are  stored  Into  during  the 

Jth  Iteration  of  Li.  Let  (R)^  be  the  set  of  elements  of  R  which  are  Input 

during  the  Jth  Iteration  of  L2. 

Further  conditions; 

(4)  (R)j  =  (R)j  for  all  J  such  that  there  Is  a  Jth  Iteration  of 
Ll  and  of  L2. 

(5)  (R)  j  fl  (R^  *  Mf  1  /  J 

(6)  Si's  change-set  Is  disjoint  from  each  Input  to  SI,  and  from 
the  change- set,  S,  of  S2. 

(7)  No  variable  stored  Into  by  S2  Is  accessed  by  SI. 

Theorem:  If  conditions  1-7  above  are  met,  there  exists  a  single  algor¬ 
ithm  S3  computationally  equivalent  to  the  sequence  S1;S2. 
Furthermore,  R  may  be  replaced  by  a  smaller  set  V  of  variables, 
where  slze(V)  *  m^x  [slze((R) ^)]. 

Proof:  Conditions  (1),  (4)-(7)  Imply  that  Ll  and  L2  may  fuse  if  they 

are  adjacent.  If  Ll  and  L2  fuse,  then  these  same  conditions 
allow  R  to  be  reduced  In  size  to  V,  where  slze(V)  Is  as 
above.  We  must  show  that  Ll  and  L2  can  be  made  to  be  adja¬ 
cent. 

In  the  statement  sequence  S1;S2,  suppose  there  are 
statements  SO  following  Ll  In  SI.  Since,  by  (2),  Ll  Is  the 
only  statement  In  31  which  stores  Into  R,  SO  does  not  store 
into  K.  Since  R  Is  the  only  Input  to  32  which  is  computed 
by  SI,  and  since  no  variable  accessed  by  SI  is  stored  Into  by 
S2,  SO  can  follow  S2  without  affecting  the  computation.  Now, 
suppose  there  are  statements  S4  of  S2,  which  precede  L2  In 
S2.  Since  all  accesses  of  R  He  In  L2,  S4  cannot  access  R. 
Since  S2  does  not  store  Into  any  variable  accessed  by  SI, 
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neither  does  S4.  Further,  84  cannot  depend  on  any  reault 
computed  by  SI,  for  It  does  not  depend  on  R's  contents, 
and,  since  R  Is  Si's  result-sat..  It  does  not  depend  on  any 
other  variable's  contents  computed  by  Si.  Therefore,  S4  can 
be  moved  to  precede  Si  without  changing  the  c computational 
affect.  The  result  of  these  moves  Is  a  statement  sequence 
In  which  no  statements  Intervene  between  Li  and  L2.  LI  and 
L2  can  then  fuss,  tnd  the  result  follows. 

11.12  Matrix  Operation  Algorithms  (HQA's) 

The  results  of  the  previous  section  can  be  applied  to  the  specific 
algorithms  which  are  used  In  coaqmtlng  matrix  arithmetic  expressions. 

A  matrix  operation  alaorlthm  (MQA)  Is  a  simple  algorithm  which 
computes  an  associated  matrix  asslmannt  statement.  Syntactically,  a 
matrix  asslgnawnt  statement  Is  written  "V+S",  where  V  Is  a  matrix  Iden¬ 
tifier,  and  K  Is  a  matrix  arithmetic  expression.  Semsntlcally,  this 
matrix  assignment  statement  cosaands  the  replacement  of  the  contents  of 
the  array  named  V  with  the  value  of  the  matrix  arithmetic  expression,  E. 
The  value  of  E  is  computed  from  the  contents  Its  variables  hold  imaedl- 
ately  before  the  statement  Is  executed.  The  result-set  of  the  MQA  Is 
the  array  V;  the  Input-set  Is  the  union  of  the  arrays  whose  names  appear 
In  K.  A  copy  of  an  MQA  Is  a  systematic  substitution  Instance  of  the  MQA. 
A  substitution  Instance  of  an  MQA  results  from  substituting  new  arrays 
for  the  arrays  referenced  by  the  MQA.  Suppose  an  MQA,  B,  computes  die 
matrix  assignment  statement  "V*4t".  A  substitution,  S,  which  changes 
array  Z  to  S(X),  Is  systematic  If  and  only  If,  for  all  arrays  R  whose 
name  occurs  In  E,  If  named)  -  name(V)  then  name(Sd))  -  nams(S(V» 
and  If  naamd)  /  name(V)  then  nnme(Sd))  /  nams(S(V)).  The  same  sub¬ 
stitution  may  then  be  applied  to  the  MOA's  matrix  assignment  statement, 
to  yield  the  new  MQA's  associated  matrix  assignment  statement. 

For  example, 

A*B*C,  A+B*B,  A4-D*E  are  all  copies  of  X*-Y*Z.  However,  A+A*C  Is  not 
a  copy  of  X«-Y*Z,  for  since  X  Is  the  variable  stored  Into,  and  x/y,  the 
new  names  for  these  variables,  S(X)  and  S(Y)  must  not  agree. 


K 

0' 
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A  substitution  which  Is  not  systeamtlc  cannot  In  ganaral  ba  applied 
to  an  MOA  without  changing  tha  MOA's  assfgnamnt  statement  radically.  For 
example,  suppose  tha  (not  systesmtlc)  substitution 

A  replaced  by  A 
C  replaced  by  A 
B  replaced  by  B 

3 

Is  applied  to  Algorlthai  1  of  the  n  algorithms. 

The  algorithm  becomes: 

1  -*  N 
J  -*H 

A[I, J]  «-  0 
K  -»  H 

A[I,J]  +  «-A[I,K]  *  B[K, J] 

This  algorithm  a»st  definitely  falls  to  compute 
A  «-A  *  B, 

for  some  elaawnt  of  each  row  of  A  Is  aet  to  aero  before  It  can  be 
accessed.  Precisely  idiat  it  computes  la  not  cleart  but  It  certainly 
does  not  compute  the  value  of  A  *  B,  and  store  this  value  in  A,  as 
the  assignment  statement  A  «-A  *  B  requires. 

An  algorithm  to  compute  any  matrix  arithmetic  expression  can  be 
constructed  from  copies  of  the  MOA's  in  a  suitably  chosen  basic  set  of 
MOA's.  An  example  of  a  basic  set  of  MOA's  la  any  set  containing  MOA's 
to  compute  the  matrix  asalgneent  statements 

A  <-  B  +  C  and  A  «-  B  *  C 

Suppose  we  are  given  a  set  of  MOA's  containing  MOA's  that  compute 
A  4-B  -f  C  and  A  «-B  *  C. 

We  can  use  sequences  of  copies  of  these  MOA's  to  compute  the  value  of 
any  matrix  arithmetic  expression,  B. 
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Proof:  by  induction  on  the  number  of  operators  in  the  expression  E. 

If  E  is  a  matrix  identifier,  V, 

then  the  value  of  E  is  defined  to  be  the  contents  of  V.  But 
the  null  sequence  of  MOA-copies  computes  just  this,  in  V. 

If  E  is  of  the  form  op  V^,  where  and  Vg  are  matrix  identifiers, 

and  op  is  +  or  *,  then  the  single  MOA-copy  which  computes 
Z  *-Vj  op  Vg  computes  the  value  of  E  into  Z.  Z  must  be 

chosen  to  differ  in  name  from  both  and  Vj. 

If  E  is  of  the  form  op  Ej,  where  Ej  and  E^  are  expressions 

containing  at  least  one  operator,  and  op  la  +  or  *,  then 
Ej  and  E2  can  each  be  computed,  into  any  prescribed  arrays 

Xj  and  X2,  by  sequences  of  MOA-copies  Cj  and  C2.  Choose 

Xj  different  from  any  array  occurring  in  C2,  and  write 

the  sequence  "Cj ;  C2;  op  X2*',  where  Z  differs  from  Xj 

and  from  X^.  This  sequence  computes  the  value  of  E 

into  Z. 

Let  us  csll  this  technique  COMF1. 

The  sequence  of  MOA* s  produced  by  COMF1  for  a  given  expression  E 
requires  several  arrays  to  hold  values  of  subexpressions  of  E.  The 
semantics  of  the  expression  prevents  these  Intermediate  values  from 
ever  being  input  to  another  statement  in  the  program.  In  fact,  the 
result  produced  by  each  MOA  in  the  sequence,  if  it  is  not  the  final 
MOA* s  result,  is  input  to  only  one  other  MOA.  The  sequence  also  hss 
the  property  that,  if  the  expression  E  contains  no  common  subexpressions, 
no  subexpression  of  E  is  ever  computed  more  than  once.  Thus,  this  tech¬ 
nique  produces  a  minimum- connect  ion- time  MOA  to  compute  any  matrix 
arithmetic  expression. 

Refinements  of  COMF1  for  translating  matrix  arithmetic  exprea- 
alona  into  sequences  of  MOA-copies  can  be  derived,  which  reduce  the 
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number  of  arrays  used  in  producing  tho  expression's  value.  Node*  that 
tha  tachnlqua  glvan  raqulraa  that  cartaln  arraya  oust  ba  choaan  to  ba 
dlffarant  fro*  others  usad  in  certain  MDA's.  Thus  notice  tha  constraint 
on  Xj  lspoaad  In  COMF1.  This  pravents  one  algorithm's  result-set  fro* 
being  stored  Into  during  tha  course  of  another,  before  that  result-set 
Is  accessed  by  the  algorltha  which  *ust  receive  Its  contents.  However, 

It  forces  arrays  not  needed  to  hold  the  value  of  the  expression  to  be 
used  to  carry  these  Intermediate  results.  So*a  of  these  Intermediate 
arrays  can  be  eliminated. 

At  least  two  techniques  for  refining  COMF1  exist.  One  -can,  using 
only  the  two  given  MOA -assignments,  reduce  the  number  of  Intermediate 
arrays  used,  by  re-orderlng  the  sequence  of  execution  of  the  component 
MDA's.  This  technique  works  by  changing  the  statements  over  which  a 
given  Intermediate  array  must  remain  Intact,  and  hence,  be  a  distinct, 
non-usable  storage  area.  The  maximum  number  of  such  distinct  Intermediate 
arrays  occurring  In  an  expression's  translation  Is  the  number  of  arrays 
needed  to  compute  that  exprusslon.  This  number  can  be  minimized.  Alter¬ 
natively,  one  can  develop  new  algorithms,  capable  of  computing  expressions 
with  more  operators.  It  these  algorithms  themselves  each  need  so  little 
temporary  storage  (say,  a  factor  of  N  less  than  an  array)  that  It  can  be 
discounted,  the  "larger"  the  expressions  which  can  be  computed  using 
only  one  array  to  hold  results.  Both  techniques  can  be  employed  together, 
as  well. 

From  certain  sequences  of  two  adjacent  simple  algorithms,  we  can 
reduce  the  storage  needed  for  their  consaunlcatlon  to  the  amount  needed 
during  one  of  their  outer  loop  Iterations.  Suppose  that  we  have  trans¬ 
lated  some  expression,  using  COMF1 .  The  resulting  sequence  of  MDA's  has 
many  pairs  of  communicating  adjacent  MDA's,  each  pair  using  an  array 
for  comnunlcatlon.  This  suggests  the  use  of  algorithm  parallel  connec¬ 
tion,  to  produce  a  collection  of  algorithms  for  the  basic  set  which 
compute  expressions  having  more  than  one  operator,  and  yet  need  a  negli¬ 
gible  amount  of  Intermediate  storage. 
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11.13  Shapes.  the  "Valences"  of  an  HOA 

A  convenient  turnery  of  the  combining  propertlee  of  MOA' e  cen  be 
devised.  Piret,  note  thet  the  verleblea  ueed  for  communication  between 
two  algorithms  are,  in  the  case  of  HOA' a,  organised  into  arrays.  Fur¬ 
thermore,  COMF1  shows  that  we  can  freely  choose  these  arreye.  In  fact, 
we  could  choose  e  distinct  array  for  each  pair  of  coaaeunicating  MOA' a 
in  the  sequence  COMF1  produces.  Note  also,  that  once  an  array  of  the 
sequence  is  acceaeed  by  an  MOA,  its  contents  is  not  needed  by  any  other 
MOA.  These  considerations,  checked  against  the  requirements  which  allow 
two  algorithms  to  parallel-connect,  rapidly  reduce  the  potentially 
unsatisfied  conditions. 

Of  the  conditions  allowing  two  algorithms  to  parallel-connect,  only 
a  few  are  possibly  unsatisfied  in  an  appropriate  sequence  of  MOA's.  Since 
any  communicating  storage  in  such  a  sequence  takes  the  form  of  an  array, 
it  seems  natural  to  investigate  the  remaining  requirements  by  character¬ 
ising  each  array  used  to  hold  Inputs  or  results  of  an  MOA. 

Let  us  define,  for  each  array  X  of  an  MOA  which  occurs  in  a  loop  L 
of  the  MOA,  the  element- sets  of  that  array.  (X>L  ^  will  represent  the 
subset  of  variables  of  array  X  input  to  L  during  L's  jth  iteration. 
Similarly,  (X)^  j  will  represent  the  subset  of  X  stored  into  during  L's 
Jth  Iteration.  For  any  array  X,  these  sets  represent  a  subset  of  X's 
variables  selected  by  a  certain  subset  of  the  possible  subscript  com¬ 
binations  by  which  elements  of  X  are  selected.  Let  the  collection  of 
subscripts  In  (X)L  j  be  j  •***  similarly  let  Qp ^  ^  select  (X)L  j. 

Mow  define  the  shape  associated  with  an  array  X  to  be  the  sequence 

*  of  subscript- sets  [x]t  ,  or  fx)  according  as  X  is  input  to  or  stored 

“»j  “»j 

Into  during  a  single  outermost  loop  L  of  the  algorithm.  If  X  occurs  in 
more  than  one  outermost  loop  of  the  algorithm,  let  its  shape  be  defined 
to  be  0.  Furthermore,  if  [x]L  ^  H  CX3L  /  0  (or 
for  some  i  i  J,  let  X's  shape  be  0. 

The  shape  o  is  assigned  to  an  array  to  prevent  fusion  at  this  array 
with  other  MOA's.  Assignment  of  0  to  an  array  in  effect  demands  the 
presence  of  an  entire  array  to  hold  values  during  a  computation.  When- 
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ever  It  la  difficult  to  assign  ■  "comblnable"  (non-O)  shape  to  sane 
array,  0  may  be  assigned,  without  altering  the  correctness  of  the  al¬ 
gorithm  resulting  from  parallel-connection  at  that  array. 

The  concept  of  "shape"  is  our  promised  useful  summary  of  an 
algorithm's  parallel-connection  property.  In  some  sense,  it  corresponds 
to  a  chemist's  "valence":  vc  state  that  MOA  B  can  parallel  connect  to 
MQA  C  at  input  X  of  C  if  the  shape  of  the  result-array  of  B  and  the  shape 
of  X  in  C  match.  Two  shapes  match  if  neither  are  ft,  and  if  they  are 
equal,  eleamnt-set  by  element-set,  for  the  first  K  element-sets  of  the 
sequences  where  K  is  defined  as  the  largest  number  j  such  that  iteration 
J  of  either  loop  exists.  Ue  call  the  shape  associated  with  an  input  array 
X  of  an  MOA  C,  the  access-characteristic  of  X  in  C.  The  shape  associated 
with  the  result-array  of  an  algorithm  C  is  called  the  result-characteris¬ 
tic  of  C.  The  algorithm  resulting  from  the  parallel  connection  is 
called  the  fusion  of  A  to  input  X  of  B. 

We  can  simplify  shape-comparisons  considerably  by  assigning  descrip¬ 
tive  names  to  the  most  commonly  occurring  shapes.  The  table  below  lists 
these  names,  together  with  a  one-letter  abbreviation  for  each,  defining 
them  in  terms  of  the  subscripts  their  Jth  element-set  is  selected  by: 

r  -  row.  All  [J,l]  such  that  1  <  1  <  N 

c  -  column.  All  [1,J]  such  that  1  <  1  <  N 

0  -  0.  No  subscript  set  corresponds.  Given  to  "unfuseable" 
arrays. 

The i  em:  If  the  result-characteristic  of  MQA  A  matches  the  access 

characteristic  of  input  X  of  MQA  B,  and  if  A's  change-set  is 
disjoint  from  each  of  A's  input-sets,  and  if  B's  result  array 
is  disjoint  from  X,  then  there  exists  an  algorithm  C  such  that 

(1)  C  computes  a  matrix  assignment  statement  derived 
by  substituting  the  expression  of  a  substitution 
of  A's  associated  matrix  assignment  statement  for 
each  occurrence  of  X  in  B's  matrix  assignment 
statement. 


UFV  JV,* 
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(2)  C  la  constructed  as  Che  sequence  S(A);  8(B),  with 
S(A)  parallel  connected  Co  input  X  of  B.  S(A)  is 
a  systeaatlc  aubstltutlon  of  A,  such  Chat  A's 
result  becoaas  X  In  S(A).  X  in  B  rr  ialns  X  in 
8(B),  but  B's  change-set  is  chosen  differently 
from  any  variable  A. 

Proofs  The  sequence  S(A);  8(B)  computes  in  S(B)'s  result-set,  the 

required  matrix  assignawnt  stataasnt.  We  will  take  this  as  the 
result  of  C.  We  con  therefore  assuae  that  no  value  coaputed 
into  variables  not  in  the  result-set  of  B  is  input  to  any 
later  stateasnt  in  the  prograa,  by  the  definition  of  result- 
set.  It  raaains  to  show  that  8(A);  S(B),  a  sequence  of  two 
adjacent  siaple  algorithms,  can  fuse. 

We  observe  chats 

(1)  the  result  coaputed  by  8(A)  into  X  is  input  to  no 
statoasnt-sequence  other  that  8(B),  by  our  defin¬ 
ition  of  the  result-set  of  the  sequence. 

(2)  The  result-shape  of  A,  and  hence  of  8(A)  does  not 
equal  0.  Therefore,  there  Is  a  single  outermost 
loop  LI  of  8(A)  which  encloses  all  statements  of 
8(A)  lAlch  store  into  X. 

(3)  Similarly,  the  access-shape  of  X  in  B,  asid  hence  in 
8(B),  does  siot  equal  0.  Hence  there  is  a  single 
outermost  loop  L2  of  8(B)  tdilch  encloses  all  state¬ 
ments  which  access  X  in  8(B). 

Let  (£)j  stand  for  the  subset  of  variables  of  X 
stored  into  during  LI 's  jth  iteration,  and  (X) ^ 
be  the  subset  of  X  input  during  the  Jth  iteration 
of  L2. 

(4)  The  result-shape  of  A  equals  the  access-shape  of  X 
in  B,  because  the  non-0  shapes  match.  Hence,  the 


subscript-sets  [X]^  ■  [X]^  for  *11  J  <  K.  But 
applying  Identical  subscripts  to  the  im  array,  X, 
-elects  ldsntlcol  vorloblss.  Hence  (X)^  »  (X) ^  for 
all  j  <  X.  Then  (X)^  -  (X) ^  for  oil  j  ouch  that 
there  la  a  Jth  Iteration  In  LI  and  L2. 

(5)  (X)j  0  (X)£  m  j  If  1  ft  J,  for  It  thla  weren't  ao, 
the  result-characteristic  of  A  would  be  O* 

(6)  3(A)  'a  change-sat  la  disjoint  froa  each  of  S(A)'s 
Input- sets  by  assumption,  S(A)'s  change-set  Is 
disjoint  froa  S(B)'s  change-set  by  construction  of 
the  substitutions. 

(7)  3(B) 'a  change-set  Is  disjoint  froai  any  variable  In 
3(A),  so  no  variable  stored  Into  by  3(B)  can  be 
accessed^by  3(A), 

(8)  No  variable  stored  Into  by  S(A)  Is  Input  to  any 
stateaent  other  than  S(B),  by  our  definition  of 
the  sequences  result-set  to  Include  only  variables 
in  S(B)'s  result-set. 

Thus,  all  the  hypotheses  of  the  parallel-connection  theorea  are  satisfied. 

11.14  Explicit  Rule  for  Developing  Shapes  for  Arrays  used  In  Aliol  Progress 

In  general,  a  shape  cannot  be  calculated  for  each  array  occurring  In 
an  MOA.  The  difficulty  arises  because  the  eleaent-sets  which  constitute 
the  shape  any  depend  on  values  coaputed  during  the  algorltha.  In  general, 
because  wa  cannot  predict  these  values,  we  cannot  decide  the  aaabershlp 
of  the  eleaent-sets. 

The  Algol  progress  we  use  as  Illustrations,  however,  all  reference 
eleaent-sets  In  a  particularly  slaple  way.  Their  loops  are  thorougily 
predicable  FOR-stateaents,  for  which  the  value  of  the  FOR-stateaent's 
Index  at  the  start  of  the  loop's  jth  Iteration  Is  easily  calculable. 

These  Indices  are  the  only  variables  appearing  In  array  subscripts. 
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Furthermore,  no  conditional  statements  occur  to  skip  statements,  leaving 
varlablea  unreferenced,  even  though  their  subscript-combination  appar¬ 
ently  appears.  For  these  loops,  we  can,  and  do,  calculate  shapes. 

11.13  Farallel  Connection  Aliorithm 

Suppose  we  are  given  two  Algol  simple  algorithms  A  and  B  such  that 
A's  result  is  input  to  B.  Suppose  that  A' a  result-characteristic  and 
B's  access-characteristic  match.  Then  A  end  B  may  be  parallel -connected. 
Here,  w  moke  explicit  how. 

First,  set  down  A  immediately  preceding  B.  A's  result-characteristic 
is  a  non-0  shape.  Therefore,  there  exists  an  outermost  loop,  LI,  of  A, 
enclosing  all  statemente  of  A  which  change  A's  result.  Similarly,  an 
outermost  loop  of  B,  L2,  exists,  enclosing  all  accesses  of  A's  result. 

Move  any  intervening  statements  out  from  between  LI  and  L2,  moving  those 
statements  of  A  after  B  and  those  of  B  before  A.  Fuse  LI  and  L2,  by 
deleting  L2'a  controlling  for-clause,  and  the  now-adjacent  end-begin 
pair  Which  enclosed  it.  Replace  the  for-clause  with  statements  to  ensure 
that  L2's  index  is  stepped  in  exactly  the  way  it  was  stepped  by  L2's  for- 
clause.  How,  substitute  an  intermediate  variable  name  for  A's  result 
throughout  the  combined  algorithms. 

Ibis  technique  of  fusing  two  Algol  loops  falls  to  account  for 
possible  "conflicts"  of  indexes.  Saamntlcally,  an  Algol  FOR- loop's 
controlled  variable,  or  index,  takes  on  an  "undefined"  value  after  the 
FOR-llst  is  exhausted.  He  can  take  this  to  swan  that  this  value  may  not 
be  input  to  any  statement  in  the  program.  Therefore,  we  may  substitute 
a  new  variable  for  die  FOR-statemsnt's  index,  without  changing  the 
meaning  of  that  FOR- statement.  He  use  this  property  to  avoid  such  con¬ 
flicts. 

Suppose  II  is  LI 's  index,  and  LI  and  L2  are  to  fuse.  If,  in  L2, 

II  is  stored  into,  we  say  a  "conflict  on  II"  exists.  To  avoid  it,  we 
simply  substitute  for  II  a  variable  not  occurring  in  LI  or  L2. 

He  have  been  sosmwhat  vague  in  describing  how  we  insure  diet  L2'a 
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index  la  stepped  exactly  the  sea*  vey  L2's  POR-clause  stepped  it. 

In  general,  this  cen  be  eccos^liahsd  by  substituting  en  eppropriate 
sequence  of  conditional  steteaents  end  essignasnts.  The  Algol  report  [6] 
suggests  such  sequences  for  each  TOR- list  element-type.  By  "expanding" 
the  POR-clause  into  the  siaplar  steteaents  it  abbreviates,  and  then 
redetermining  the  statement  the  loop's  exit  is  to  reach,  L2's  index  cen 
be  stepped. 

In  certain  loop-fusions,  a  simpler  approach  cen  be  used.  Suppose 
that  LI  end  L2  are  to  fuss,  end  that  each  Iterates  the  seam  number  of 
times.  Suppose  also  that  II  is  Li's  index  end  12  is  L2's. 

Suppose  that  the  first  statement  in  the  Algol  text  of  each  loop's 
body  is  given  the  number  "1",  and  the  last  in  loop  Li's  text  is  numbered  Kl. 

If  there  is  a  function  P  such  that  P(v((L1,K1, J),I1))  ■  c((L2,1 , j),I2) 
for  all  j  then  statements  to  cceqmte  12  frea  II 's  current  value  directly 
nay  replace  12 's  iterative  computation.  Theee  steteaents  nay  be  placed 
Just  before  the  first  statement  of  L2*s  body  in  the  fusion.  If  no  state¬ 
ment  of  L2  stores  into  12  (the  usual  case),  thee  P(I1)  may  replace  each 
occurrence  of  12.  (In  many  cases,  such  sn  P  exists— the  identity  function.] 
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Example  1 

a:  I  -*N  A:  r 

J  -»N  B:  n 

C[I,J]  *-0  C:  r 

K  -»N 

C[I,J]  +  *-A[I,K]  *  B[K,J] 

b:  I  -»N  D.  r 

J  -» N  E:  n 

A[I, J]  ♦-  0  A:  r 

K  -» N 

A[I, J]  ♦  «-D[I,Kj  *  E[K, J] 

We  can  reduce  A  from  a  2-array  to  a  1  array: 

Fusion  begun: 
delete  the  second 
for  clause. 


a:  (I  -»N) 

J  -*  N 

C[I,J]  *-0 
K  -*  N 


b:  I  -»N 

J  -» N 

A[I, J]  «-0 
K  -*N 

A[I,J]  +  «-  D[I,K]  *  E[K,  J] 


C[I,J]  +  «-  A[I,K]  *  B[K,  J] 


Reduce  Che  else  of  A  Co  a  row  by  assigning  Che  Ich  elemenC  of  Che 
JCh  row  Co  U[I];  1.*.,  substitute  U[x]  for  A[y,x]  wherever  A  occurs: 

I  -» N 
J  -*  N 

U[J]  <-  0 
K  -» N 

U[J]  +  4-  D[I,K]  *  E[K,J] 

J  -»  N 

C[I,J]  4-0 
K  — ♦  N 

C[I,J]  +  4-U[K]  *  B[K, J] 

Space  characterisCics:  D:  r 

E:  n 

B:  n 
C:  r 


J  is  Che  index  of  bt  K  ChaC  of  a.  Simple  replacemenC 
condiCions  are  met.  An  F(J)  which  can  replace  K  through¬ 
out  a,  is  'J'.  A  name  conflict  arises,  so  choose  J1  for 
both  J  in  b  and  K  in  a.  Let  U[.x]  =  A[xty], 


and  upon  fusing,  we  get 
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11,16  Matrix  Elementary  Algorithm 

Let  us  define  the  matrix  elementary  algorithm  of  site  k  (k-MEA's) 
to  be  a  subset  of  the  MOA's  satisfying  certain  crnistralncs: 

(1)  There  is  a  single  loop  in  each  algorithm,  called  its 
main  loop,  whose  input-set  includes  each  variable  input 
to  the  algorithm,  and  which  stores  into  each  variable 
of  the  algorithm's  result-set. 

(2)  All  arrays  of  the  algorithm  assigned  non-0  shape 

3 

are  k-arrays,  i.e,,  have  sites  of  N**k. 

(3)  Each  nom-0  shape  associated  with  an  array  of  die  algorithm 
consists  of  1*N  equal-sized  element  seta.  Here,  j  may 
differ  depending  on  the  shape,  but  may  not  depend  on  N. 
Hence,  each  element-set  of  a  nonO  shape  of  a  k-KEA  hae 
size  proportional  to  N**(k-1) 

(4)  Each  k-MEA  uses  no  more  than  L*N**(k-1)  intermediate 
variables.  Here,  L  does  not  depend  on  N, 

Some  consequences  of  our  definition  of  k-MEA  follow: 

Theorem  1:  If  two  k-MEA's  fuse,  the  result  is  a  k-MEA. 

Theorem  2:  The  inputs  to  a  k-MEA  must  all  be  present  simaltan'cisly 
at  some  instant  in  time. 

Theorem  3;  If  two  k-MEA's  fuse,  any  input  to  either  except  the  result 
of  the  first  becomes  an  input  to  the  fusion. 


3 

He  will  use  the  FORTRAN  notation  **  for  exponentiation  and 
*  for  multiplication 
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Proof  of  Theorem  1:  Suppose  A  and  B  are  k-MEA's,  and  C  is  the  fusion 

of  A  to  input  X  of  B.  Then  C  is  a  simple  algorithm, 
since  the  initial  step  of  the  fusion  process  merely 
lists  the  statements  of  A  before  those  of  B.  The 
sequence  of  two  simple  algorithms  Is  itself  a  sim¬ 
ple  algorithm.  We  must  shovr  that  C  Is  a  k-MEA. 

We  first  need  to  show  that  C  contains  one  loop, 
which  stores  into  each  variable  of  the  result  of 
C,  and  accesses  each  variable  of  each  input-set  of  C. 

(1)  The  result-set  of  C  is  defined  to  be  the  result 
set  of  a  substitution  of  B.  B,  and  hence  ..(B) 
are  MEA's,  and  therefore  there  is  a  loop,  L2, 
which  stores  into  each  element  of  S(B)'s 
result-set.  Furthermore,  L2  accesses  each 
variable  input  to  S(B).  In  particular,  X 

is  an  input  to  L2.  But  X's  access  character¬ 
istic  is  not  fl,  since  A  can  fuse  to  X  of  B. 
Therefore,  X  must  be  input  only  to  L2,  and  hence 
L2  is  the  loop  of  S(B)  which  fuses  with  some 
loop  of  S(A).  Also,  A's  result-characteristic 
is  not  Q,  and  A  is  an  MEA.  Similarly,  there 
is  a  loop  Li  of  S(A)  to  which  each  input  of  A 
is  input,  and  which  must  be  the  loop  which  fuses 
with  L2.  The  fused  loop,  L,  has,  as  Inputs, 
all  Inputs  to  S(A)  and  to  S(B)  (except  X). 

The  Inputs  to  L  thus  Include  all  the  inputs  to 
C.  Furthermore,  L  stores  into  each  variable  in 
C's  result-set,  for  it  stores  into  B's  result- 
set.  Therefore,  L  satisfies  the  definition  of 
a  k-MEA's  main  loop. 

(2)  All  arrays  of  the  original  k-MEA's  were  k-arrays. 
Hence  all  arrays  of  the  fusion,  a  subset  of  the 
arrays  of  the  original  k-MEA's,  are  k-arrays. 


mu 
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(3)  Each  shape  associated  with  an  array  o£  B 
or  of  A  other  than  X  Is  still  the  same, 
since  loop-fusion  does  not  change  the 
element-sets  of  inputs  or  result-sets. 
Therefore,  each  array  has  a  shape  con¬ 
sisting  of  j*N  equal-sized  element-sets, 
since  it  did  so  in  the  original  k-MEA's. 

(4)  Let  A  and  B  be  the  k-MEA's  which  fuse  to 
form  C,  Then  A  used  no  more  than  jl*N**(k-l) 
intermediate  variables,  B  no  more  than 
J2*N**(k-1).  C  uses,  at  most,  all  inter¬ 
mediate  variables  of  A  and  of  B,  plus  those 
variables  used  to  hold  X,  which  is  inter¬ 
mediate  in  C.  But  the  fusion  reduces  the 
number  of  variables  needed  for  X  to  one 
element-set  of  X,  or  j3*N**(k-1)  variables, 

by  property  (3)  of  a  k-MEA,  and  the  fact  that, 
since  A  parallel  connects  to  B  at  X,  X  must 
have  a  non-fl  shape  in  A  and  In  B.  Thus,  C 
uses  at  most  (J1  +  JJf  +  J3)  *  N**(k-1) 
intermediate  variables,  where  J1  do  not 
depend  on  N, 

Proof  of  Theorem  2:  The  Inputs  to  a  k-MEA  must  all  be  present  simul¬ 
taneously  just  before  the  algorithm's  execution, 
for  they  arc  all  inputs  to  the  same  loop  of  the 
k-MEA,  By  definition  of  "input",  the  contents  of 
each  variable  just  before  the  loop  is  executed  is 
accessed  by  this  loop.  But  then  all  variables 
to  the  loop  must  have  been  in  existence  simultan¬ 
eously  just  before  the  loop  executed.  These  Inputs 
are  precisely  the  inputs  to  the  k-MEA,  by  definition 
of  k-MEA. 
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Proof  of  Theorem  3:  The  Inputs  to  the  main  loop  of  the  first  algorithm 

are  clearly  inputs  to  the  main  loop  of  the  fusion, 
because  if  they  were  accessed  by  the  first's  main 
loop,  they  are  now  accessed  by  the  fusion's  main 
loop.  Similar  reasoning  holds  for  each  variable 
input  to  the  second's  main  loop.  However,  some  of 
those  variables  are  now  not  in  existence  before  the 
fusion,  since  in  the  sequence  they  were  results  of 
the  first  k-MEA.  Hence,  all  input  variables  to  either 
MEA  except  the  result  of  the  first  k-MEA  become 
variables  of  the  fusion. 

IT.  17  Canonical  k-MEA 1  a 

A  canonical  k-MEA  is  a  k-MEA  which  computes  a  matrix  assignment 
statesmnt  satisfying: 

No  identifier  appearing  in  the  expression  (right-side) 
of  the  assignment  agrees  in  name  with  the  left-side 
variable  of  the  statement. 

Equivalently,  the  result-set  of  a  canonical  k-MEA  is  disjoint  from 
each  of  its  input-sets. 

Theorem:  If  the  result-characteristic  of  a  canonical  k-MEA  A  matches 
the  access-characteristic  of  input  X  of  canonical  k-MEA  B, 
then  A  may  parallel-connect  to  input  X  of  B  to  form  a 
fusion  C  which  is  Itself  a  canonical  k-MEA. 

Proof:  A  and  B  are  MOA's  satisfying  the  hypothesis  of  the  theorem 
of  Section  13.  Therefore,  they  may  fuse  to  form  an  MQA. 

This  MQA  is  a  k-MEA,  by  Theorem  1  of  Section  16.  The  fusion 
k-MEA  computes  a  matrix  assignment  statement  whose  expression 
results  from  substituting  a  systematic  substitution  Instance 
of  A's  expression  for  each  occurrence  of  X  in  B.  B  is  canon¬ 
ical,  so  that  no  input-array  of  B  has  the  same  name  as  B's 
result-array.  The  substitution  instance  of  A  can  be  so  chosen 
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to  arrange  that  no  Input  array  of  A  is  given  the  same  name 
as  B's  result-array.  But  then  C  la  canonical. 


Chapter  III 


In  this  chapter,  we  apply  the  results  of  Chapter  II  to  matrix 
arithmetic  expressions.  Our  goal  Is  an  algorithm  which  compiles 
these  expressions  Into  canonical  2-MEA's,  choosing  a  compilation  which 
uses  fewest  2-arrays.  We  will,  throughout  this  chapter,  use  "array" 
to  abbreviate  2-array,  and  "MBA"  to  abbreviate  2 -ME A. 


III.l  Elementary  Expression  Parse-Trees  (EEPT's) 

In  order  to  study  the  possible  compilations  of  an  expression 
Into  MEA's,  It  is  convenient  to  examine  parse-trees,  both  of  the 
expression,  and  of  the  available  primitive  MEA's.  The  significance 
of  a  parse-tree  in  expression  compilation  stems  from  the  fact  that, 
for  the  expressions  we  deal  with,  a  parse- tree  is  a  data-flow  diagram 
for  the  expression.  That  is,  if  x  and  y  are  nodes  of  a  parse-tree, 
and  y  is  x's  father,  then  x  cannot  be  evaluated  after  y,  for  x's  re¬ 
sult  is  an  input  to  the  computation  which  yields  y's  result,  [x  can, 
however,  be  calculated  in  pieces,  at  the  same  time  parts  of  y  are  being 
calculated.]  The  parse-tree,  or  its  generalization  the  data-flow 
diagram  gives  a  partial  ordering  of  the  times  of  calculation  of  the 
expressions  rooted  at  each  parse-tree  node.  It  is  therefore  extremely 
useful  in  displaying  all  possible  valid  linear  orderings  of  those 
computations. 

Each  canonical  MEA  can  be  abbreviated  by  a  tree  diagram.  The 
structure  of  this  diagram  is  a  parse-tree  of  the  expression  the  MEA 
computes.  The  names  of  the  input  arrays,  and  the  result  are  suppressed, 
since  systematic  renamLigc  of  these  variables  yield  computationally 
equivalent  algorithms.  However,  we  associate  with  each  leaf  of  the 
tree,  and  with  the  root,  shapes— the  access-characteristic  and  result- 
characteristic  of  the  algorithm  abbreviated.  The  diagram,  called  an 
elementary  expression  parse-tree  (EEFT)  conveniently  summarizes  the 
fusion  and  computation  behavior  of  the  algorithm  it  abbreviates. 


Using  EEPT's  as  building  blocks,  we  can  device  a  technique  for 
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producing  a  large  collection  of  canonical  MEA's.  Suppose  EEPT-1 's 
result-characteristic  matches  the  access-characteristic  of  a  leaf  L 
of  EEPT-2.  Construct  the  tree  diagram  which  results  from  attaching 
EEPT-1  to  L.  This  tree  is  the  parse-tree  of  an  expression  composed  of 
the  expressions  of  EEPT-1  and  EEPT-2.  Furthermore,  this  new  expression 
abbreviates  an  algorithm  which  is  itself  a  canonical  MBA. 

To  construct  the  canonical  MEA  abbreviated  by  the  joining  of  tvo 
EEPT's,  we  can  proceed  as  follows: 

1.  Write  EEPT-1 's  algorithm,  A,  immediately  before  EEPT-2 's 
algorithm,  B. 

2.  Rename  the  matrices  used  in  A  and  in  B  so  that  A's  result- 
matrix  agrees  with  the  input-matrix  of  B  associated  with 
the  leaf  L  of  EEPT-2  whose  access-characteristic  matched 
the  result-characteristic  of  EEPT-1,  and  so  that  the  name 
of  B's  result-array  does  cot  agree  with  the  name  of  any 
array  input  to  A,  or  to  B. 

3.  Fuse  A  and  B.  This  fusion  can  be  accomplished,  because 
A's  result-characteristic  agrees  with  B's  access-charact¬ 
eristic  at  the  consnon  cocsnunicating  array. 

III. 2  AlB-Irec  Pcilaltlom 

Let  us  assume  that  we  are  given: 

1.  An  expression's  parse-tree,  E. 

2,  A  collection  of  EEPT's,  e^,...,en> 

Each  EEPT  represents  an  algorithm-class,  any  one  of  whose  members 
can  compute  the  subexpression  described  by  the  EEPT's  parse-tree.  We 
wish  to  assign  to  each  intermediate  node  of  E  a  method  drawn  from  these 
algorithms  for  computing  that  node  from  its  descendants.  Thus,  we 
want  a  correspondence  set  up  between  each  intermediate  node  of  E  and  a 
unique  intermediate  node  of  a  unique  e^. 

Definition:  We  say  that  an  elementary  expression  tree  "e"  matches 


Eat  z  if  and  only  if 
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1.  For  each  node  a  of  e  there  exists  one  and  only 
one  node  x  of  E,  Its  corresponding  node. 

2.  If  tf,  B  are  nodes  of  e  joined  by  a  line  from  a 
to  B  labeled  1,  then  so  are  their  corresponding 
nodes  In  B. 

3.  If  er  Is  an  Intermediate  node  of  e,  then  Its 
operator- symbol  matches  the  operator-symbol  of 
Its  corresponding  node  x  In  B. 

4.  The  root-node  of  e  corresponds  to  z. 

If  e  matches  E  at  x,  then  the  set  of  nodes  of  E  corresponding  to  leaves 
of  e  are  termed  the  fringe-set  of  e  at  x.  He  will  often  Identify  the 
fringe-set  by  listing  the  set  of  Its  node-nasms.  Associated  with  each 
line  Incident  on  a  member  of  a  fringe-set  of  e  at  x  is  the  corresponding 
line  of  e.  These  terminal  lines  of  e  have  an  access-ahape  characteristic 
which  Is  thereby  associated  with  lines  of  E.  The  access-shape  associated 
with  line  L  Is  termed  the  line-shape (L).  If  only  one  line  L  Is  Incident 
on  a  node  x,  we  define  leaf-shape(x)  -  line-shape (L).  Also,  the  root 
of  e  is  associated  with  a  shape  attribute,  called  the  root-shape  of  e. 

Definition  of  an  alg-tree: 

An  alg-tree  of  a  node  x  In  E  Is  an  assignment  of  elementary 
expression  parse  trees  [EEPT]  e^  to  certain  nodes  of  E.  The 
assignment  satisfies  the  following  construction  property: 

1.  An  EEPT  which  matches  E  at  x  Is  an  alg-tree  of  x  In  E. 

2.  If  T  Is  an  alg-tree  of  x  In  E,  then  if  e  is  an  EEPT 
"parallel  attachable"  to  T  at  some  node  y  of  E,  then 
T  extended  to  the  nodes  matched  by  e  Is  also  an  alg- 
tree  of  x  In  E. 

The  nodes  In  an  alg-tree  T  are  the  nodes  of  E  assigned  EEPT -nodes 
by  T.  The  root  of  an  alg-tree  T  is  that  node  In  T  having  no  ancestor 
node  In  T.  The  frlnge-sct  of  an  alg-tree  T  Is  that  set  of  nodes  which 
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have  no  Immediate  descendants  In  T.  The  connection-set  of  an  alg- 
tree  T  consists  of  that  set  of  nodes  of  T  which  corresponds  to  the 
root  of  some  KEPT  assigned  by  T  to  B. 

Definition  of  parallel-attachable: 

An  EEPT  e  is  parallel-attachable  to  an  alg-tree,  T,  In 
E  at  a  node  y  if  and  only  if: 


1.  y  is  in  the  fringe-set  of  T 

2.  e  matches  E  at  y 

3.  The  line -shape,  s^,  assigned  by  T  to  each  line  i 
incident  on  y  equals  e's  root-shape. 

4.  Every  line  incident  on  y  comes  from  a  node  iu  T. 


The  above  de'  nitions  allow  E  the  possibility  of  being  a  re-entrant 
tree,  one  in  which  a  given  subtree  may  have  several  ancestors.  This 
corresponds  to  a  generalization  of  a  parse-tree  to  a  data-flow  diagram 
for  expressions  with  common  subexpression. 

Exaaple:  An  expression  with  common  subexpressions,  such  as(A  +  B*C)* 
(  B  *  C  +  D  )  would  be  represented  as: 


Suppose  e  is  an  EEPT  which  abbreviates  an  algorithm  e.  Let  e 
match  E  at  x,  so  that  x^,...,x^  are  the  nodes  of  E  matching  leaves  of 
e.  Let  the  result  of  a  node  x  be  the  value  of  the  expression  whose  parse- 
tree  is  rooted  at  x.  If  the  results  of  each  x^  are  stored  in  arrays 
X^,  executing  a  suitable  version  of  e  produces  the  result  of  x.  We  say 
that  the  EEPT  can  compute  the  result  of  the  node  its  root  matches. 

Associated  with  each  alg-tree  A  is  a  canonical  MEA  A  whose  parse- 
tree  matches  the  portion  of  E  in  A.  If  the  results  of  each  node  in 
the  fringe-set  of  A  is  available  to  A,  A  can  compute  the  result  of  the 
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root  of  A.  A  Is  constructed  from  the  algorithms  abbreviated  by  the 
EEPT's  assigned  In  A.  One  way  of  constructing  such  an  associated  al¬ 
gorithm  proceeds  In  parallel  with  the  construction  of  A: 

(1)  Suppose  A  is  an  EEPT  e  which  matches  E  at.  x.  Then  let 
A  be  e's  canonical  MEA. 

(2)  Suppose  A  is  an  alg-tree  T  of  x,  parallel-attached  to 
an  EEPT  e  at  y.  Then  construct  A  from  t,  T's  canonical 
MEA,  and  £,  e's  canonical  MEA,  as  follows: 

1.  Systematically  substitute  a  new  matrix  name,  Z  for 
e/s  result-matrix,  and  for  the  input  ^  of  I  cor¬ 
responding  to  y.  Here,  Z  is  a  name  not  occurring 
in  T  or  e. 

2.  Write  the  substituted  e  before  the  substituted  T 

3.  Parallel  connect  £  to  T,  yielding  a  canonical  MEA. 

The  inputs  to  the  resulting  algorithm  include  all  inputs 
to  T  and  to  e. 

Steps  1  and  2  can  always  be  accomplished.  Step  3  can 
also  be  accomplished,  for  if  A  assigns  e  to  match  y,  the 
root-shape  of  e  must  agree  with  the  leaf-shape  of  y  in  T. 

But  the  root-shape  of  e  is  the  result-characteristic  of 
e  and  the  leaf-shape  of  y  in  T  is  the  access-characteristic 
of  £  in  T.  The  result  of  y  is  needed  only  as  an  input  to 
T,  since  all  lines  in  E  incident  on  y  are  in  T.  Therefore, 

Z  as  computed  by  e  need  be  input  only  to  T.  Therefore,  once 
£  and  T  have  been  placed  in  sequence,  they  can  parallel- 
connect. 

To  show  that  the  algorithm  produced  indeed  computes  the  proper 
reeult,  observe  that  if  A  13  £,  we  have  already  demonstrated  the  result. 
Suppose  then  that  A  is  the  fusion  of  £  with  input  ^  of  T.  A  is  then 
computationally  equivalent  to  £;T.  But  inmedlately  after  £  is  executed 
in  this  sequence,  all  of  T's  Inputs  are  available.  Since  T  is  T's 
associated  algorithm,  T  computes  the  result  of  the  root  of  A. 
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III, 3  Major  Properties  of  Alg-Trees 

Alg-trees  are  Interesting  primarily  because  they  explore  the 
mechanism  by  which  canonical  MEA's  may  be  constructed  to  fit  a  given 
expression-part.  This  growth-mechanism  is  Itself  important  in  growing 
alg-trees  efficiently.  Thus,  the  first  of  the  three  major  alg-tree 
properties  concerns  only  the  growth-mechanism— the  EEPT's,  not  their 
algorithms.  The  two  additional  properties  we  discuss  are  primarily 
properties  of  the  algorithms  which  can  be  grown  in  parallel  with  alg- 
trees,  the  "associated  alg-tree  algorithms"  (AATA). 

Property  1 

Let  G(S,x)  be  the  set  of  all  alg-trees  of  x  whose  root's  shape  is 
S.  Let  H(y,S,x)  be  the  set  of  alg-trees  of  y  whLch  include  x  in  their 
connection-set,  and  such  that  the  root-shape  of  the  EEPT  they  assign 
to  x  is  S. 

Then:  For  ail  U  c  G(b,x),  there  is  T  a  H(y,S,x)  such  that  U  is 
a  sub-  «lg  -tree  of  T,  For  all  T  c  H(y,S,x),  there  is 
U  c  G(S,x)  such  that  U  is  a  sub-alg-tree  of  T. 

In  other  words,  the  set  of  subtrees  of  members  of 
H(y,S,x)  rooted  at  x  and  with  root-shape  S  equals 
G(S,x),  the  set  of  all  alg-trees  of  x  with  root- 
shape  S. 

We  will  use  this  result  in  growing  the  alg-trees  rooted  at  y  from 
G(S,x).  Every  alg-tree  of  x  with  root-shape  S  can  be  "extended  upward" 
by  parallel-attaching  it  to  any  EEPT  A  which  matches  some  ancestor- 
node  y  of  x  such  that  x  lies  in  A's  fringe-set,  and  such  that  x's  leaf- 
shape  in  A  is  S.  These  extensions  ultimately  create  all  alg-trees 
rooted  at  y. 

Proof:  Each  member  T  of  H(y,S,x)  contains  a  sub-alg-tree  (which 
may  be  null,  if  x  is  in  T's  fringe-set)  which  is  rooted 
at  x  with  root-shape  S.  Clearly  this  is  a  member  of  G(S,x). 
Furthermore,  this  sub-alg-tree  may  be  replaced  by  any  mem¬ 
ber  of  G(S,x),  each  replacement  yielding  an  alg-tree  rooted 
at  y,  a  member  of  H(y,S,x). 
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The  remaining  major  properties  of  alg-trees  concern,  in  reality,  AATA's. 

Property  2  Each  AATA  is  a  canonical  MEA. 

Two  consequences  of  Property  2  are: 

a.  Each  AATA  requires  only  k*N  intermediate  variables, 
since  it  is  a  2 -MEA.  Thus,  the  storage 

internal  to  an  AATA  does  not  enter  the  leading 
term  of  the  polynomial  in  N  which  counts  the  number 
of  memory  cells  (variables)  needed  by  the  program. 

b.  If  the  AATA  of  an  alg-tree  T  of  x  is  used  to  com¬ 
pute  x ' 8  result,  then  the  result  of  every  node  in 
the  fringe-set  of  T  must  be  computed,  and  be  present, 
before  T's  AATA  can  execute. 

Proof:  The  fringe-set  of  T  lists  the  limits  of  the 
fusion  making  up  T's  AATA.  The  results  of 
these  nodes  are  inputs  to  T's  AATA,  and  must 
therefore  be  present  simultaneously  just 
before  the  AATA's  execution,  since  each  AATA 
is  an  MEA. 

Property  3  A  non-canonical  MEA  can  be  produced  from  certain  canonical 
MEA's  which  are  AATA's  of  alg-trees.  By  a  "non-canonical" 
MEA,  we  mean  one  whose  re  suit- set  may  be  assigned  the  same 
variables  as  one  of  the  MEA's  input-sets.  For  example, 
from  the  canonical  MEA  to  compute 
A  «-  B  *  C 

we  can  produce  non-canonical  MEA's  which  compute 
A  «-  B  *  C  and  A  <-  B  *  A. 

Such  non-canonical  MEA's  allow  us  to  re-use  intermediate 
variables  inmed lately,  thus  avoiding  the  need  to  provide 
an  additional  distinct  array  to  hold  the  result  of  an  MEA. 
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Theorem:  If  the  result-characteristic  of  an  ME A,  A,  matches  the 

access-characteristic  of  an  input  I  of  A,  then  the  result 
set  of  A  may  be  chosen  to  be  I's  input-set.  Ihe  MEA  which 
results  from  this  transformation  requires  one  "move"  opera¬ 
tion  for  each  element  of  I's  input  set,  in  addition  to  the 
operations  needed  by  A. 

Proof:  Suppose  A's  result-characteristic  matches  the  access-characteristic 
of  an  input  I  of  A.  Then,  each  element-set  of  the  result-set 
computed  during  a  single  iteration  j  of  A's  main  loop  equals 
(in  subscript  sets)  the  set  of  elements  of  I  accessed 
during  iteration  J.  Set  A's  result-array  equal  to  I's  input 
array.  Include  instructions  in  A's  main  loop  to  perform 
U  ♦-  (I) 

Just  before  the  body  of  A's  main  loop.  Within  the  body  of 
A's  main  loop,  refer  to  the  copy  of  (I)j  in  U  whenever  the 
contents  of  a  variable  in  (1)^  is  accessed.  A's  result, 
now  (I)j,  is  computed  into  the  seme  variables  in  (I)j  by 
this  resulting  algorithm.  Because  the  shapes  of  A's  result  and 
I  "match",  neither  is  13,  and  since  (I)j  D  (1)^  •»  /f  if 
i  /  J ,  no  variables  in  (I) ^  will  be  accessed  on  any 
iteration  other  than  the  Jth;  Because  (_I)j  ■  (1)^,  no 
variable  is  stored  into  during  iteration  j  which  is  not  in 
(I ) j ,  and  hence  copied  into  U. 


III. 4  Result-Array  and  Fringe-Set  Array  Storage  Overlap 

At  the  end  of  the  preceding  section  we  presented  a  demonstration 
that,  if  the  root-shape  of  an  alg-tree  matches  a  leaf-shape,  and  if 
an  Intermediate  array  was  assigned  to  that  leaf,  then  that  same  array 
can  be  used  to  hold  the  result.  Our  demonstration  Involved  copying 
the  set  (R) j  into  intermediate  storage  Just  before  computing  (R)^.  It 
appears  that  thls"move"  or  copy  operation  is  only  a  bookkeeping  con¬ 
venience,  and  that  at  no  cost  in  intermediate  storage  it  can  be  eliminated. 
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An  example  follows: 

Suppose  we  wish  to  compute 
X  «-  X  *  B 

where  X  and  B  are  2-arrays ,  and  X  Is  Intermediate.  We  can  confute 
X  (final)  with  only  one  1  *  N  array  of  storage  and  no  copy  as  follows: 

Instead  of  X,  allocate  Y,  an  (N+1)  *  N  array,  and  compute  X 
(initially)  in  the  first  N  rows  of  Y,  le. 

X[I,J]  .  Y[I, J] 

This  leaves  Y[N+1,  J]  empty. 

3 

Now  substitute  into  algorithm  1  (see  "the  n  basic  algorithms")  as 
follows: 

for  C[I,J]  substitute  Y[I+1,  J] 
for  A[I, J j  substitute  Y[I,J] 

Also,  reverse  the  sequence  of  values  computed  by  the  outer  loop. 

The  result  is: 

Case:  X[I,J]  .  Y[I,J]  1  <  I,  J  <  N 

J  ->  N 

Y[I+*I,J]  «-0; 

K  -»  N 

Y[I+1,J]  +  «-  Y[I,K]  *  B[K,J]; 

X  (final)  is  computed  by  this  algorithm  in  the  last  N  rows  of  Y. 

This  suggests  another  case,  differing  in  X's  initial  position  in  Y. 
Here  we  again  use  algorithm  1,  this  time  substituting 

Y(I,J]  for  C[I,J] 

Y[I+1 , J]  for  A[I,J] 

and  running  the  outer  loop  in  normal  order. 
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Case:  Y[I+1,J]  -  X[I,J] 

I  -*N 
J  -» N 

Y[I,J]  <-  0 
K  -» N 

Y[I,J]  +  «-  Y[I+1  ,1C]  *  C[K,  J] 

Virtually  the  same  construction  may  be  used  In  the  outer  loop  of 
any  canonical  MEA  whose  result-characteristic  matches  one  of  Its  access- 
characteristics.  The  direction  of  the  main  loop  Is  determined  by  the 
"position"  of  the  Input  which  Is  to  be  overstored  In  Its  larger  Inter¬ 
mediate  array. 

Similar  constructions  can  be  used  to  overstore  an  Input  whose 
access-shape  Is  "c".  However,  the  array  In  which  the  Input  to  such  an 
algorithm  Is  stored  must  be  allocated  somewhat  differently:  N  *  (N+1) 
rather  than  (N+1)  *  N,  allowing  an  extra  column.  Consideration  of  the 
N  *  (N+1)  arrays  (extra-co lmm  arrays),  and  c-r.ccess  algorithms  with 
Input  matrices  stored  in  extra-row  arrays  Indicates  a  proliferation  of 
cases.  One  simple  solution  would  simply  allocate  each  Intermediate 
array  to  be  (N+1)  *  (N+1).  This  results  In  only  four  possible  locations 
for  the  (1,1)  element  of  a  matrix  stored  In  such  an  array,  T: 

[T[1,l]  ,  T[ 1,2)  ,  T[2,1]  and  T[2,2]] 

The  appropriate  r-access  and  c-access  algorithms  can  easily  be  calculated 
In  each  of  these  cases. 

A  similar  cons tvuct Ion  applies  to  any  shape  of  an  algorithm  such  that: 

(1)  additional  element-set  of  variables  may  be  allocated  as 
an  extension  of  the  sequence  of  element- sets  which  make  up 
the  shape,  and 

(2)  The  elemeut-sets  of  the  shape  may  be  computed  (and  accessed) 
in  reverse  order  without  changing  the  result  of  the  algorithm. 

When  these  conditions  are  satisfied  for  some  shape  S,  then  any  canonical 
MEA  whose  result-shape  equals  S  may  overstore  any  of  the  MEA's  Input-arrays 
whose  access-shape  also  equals  S. 


79 


III. 5  Calculating  Intermediate  2-Array  Requirements  of  an  Alg-Trec 

One  of  the  properties  of  alg-trees  suggests  that,  in  some  cases, 
a  "large"  algorithm  with  many  Inputs  to  the  same  outer  loop  may  require 
more  space  than  the  equivalent  "smaller"  (fewer  input)  algorithms. 

This  is  the  "parallelism"  Inherent  In  parallel  attachment,  namely, 
that  all  Inputs  to  an  »lg  •tree  must  exist  simultaneously.  Given  the 
set  of  Inputs  to  an  algorithm,  the  results  at  each  node  of  the  algorithm's 
fringe-set,  we  must  determine  the  number  of  arrays  needed  to  evaluate 
all  members  of  the  fringe-set  and  which  must  be  simultaneously  present. 

Suppose  we  are  given  a  fringe-set,  each  node  1  in  it  requiring 
n(l)  arrays  for  its  computation.  In  computing  the  entire  fringe-set, 
one  intermediate  array  is  needed  to  hold  each  node's  result  unless  its 
n(l)  m  0,  for  in  this  case,  the  member  is  a  leat  of  the  expression,  and 
already  exists  as  a  program  variable.  If  node  1  is  computed  Jth  among 
the  fringe-set  members,  then  apparently  we  need  at  least 

n(i)  +  J  -  1  -  K(J) 

arrays  to  compute  it.  Here  k(j)  is  the  number  of  nodes  L  such  that 
n(L)  -  0  which  are  to  be  computed  before  node  1. 

Let  the  number  of  arrays  required  to  compute  the  fringe-set  S 
be  N(S). 

Then  N(S)  >  n(i)  +  J  -  1  -  k(J)  for  each  1 

Let  S  be  a  fringe-set,  a  set  of  nodes  numbered  arbitrarily  0  to  m. 

Let  J  be  any  permutation  on  the  integers  in  [0,m]. 

With  each  permutation  J,  we  can  associate  an  order  for  confuting  the 
nodes  of  S.  Namely,  if  these  nodes  are  numbered  0,...,m,  then  the 
1th  node  we  compute  is  numbered  J^.  Thus,  we  can  define 

N(J,S)  »  m|x  (n(J^)  +  i  -  K(J,i)) 

where  K(J,i)  is  the  number  of  elements  in  S  such  that  k  <  i  and 
n(J^)  ■  0.  Then  the  following  result  holds: 
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Theorem:  Any  permutation  J  for  which 

N(Jt)  >  n(J1+1)  0  <  i  <  m 

minimizes  N(J,S),  and  for  that  permutation 

N(S)  -  max  {n(J. )  +  i} 

0  <  i  <  m  1 

n(Jj)  >  0 

Proof:  Let  N(J,S)  ■  mfx  (a(J^)  +  i  -  K(J,i)),  where  K(J,1)  is  the 
ntanber  of  k's  such  that  n(J^)  ■  0  and  k  <  i. 

We  have  N(S)  -  min  N(J,S). 

In  a  following  lenma,  we  prove  that: 

If  for  some  i,  n(J^)  <  n(J^),  then  there  is  a 

permutation  J*  such  that 

N(J,S)  >  N(J',S),  where  J*  is  defined: 

J£  -  Jk  If  k  <  i  or  k  >  i  +  1 


Therefore,  if  there  is  an  i  such  that 

nCJj)  <  «(J1+1) 

then  there  is  a  permutation  J'  which  interchanges  and 

and  bc  that 

n(J',S)  <  N(J,S) 

By  a  sequence  of  such  Interchanges,  we  can  arrive  at  J*', 
a  permutation  in  which 

(1)  n(Jj)  >  n(Jj+1) 

even  if  we  started  with  a  permutation  J  for  which  N(J,S) 
took  on  its  minimus  value.  Furthermore,  N(J",S)  <  N(J,S). 

All  permutations  J”  satisfying  (1)  produce  identical 
values  of  N(J",S). 

Therefore,  N(S)  ■  N(J",S)  for  all  J"  satisfying  (1). 
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Lemaa:  If  for  some  i  nOJj)  <  n(Ji+1),  then  there  is  a  permutation 
J'  such  that  N(J,S)  >  N(J',S),  where  J'  is  defined: 


Jk"  Jk 

Ji  "  Ji+i 
Ji+i  "  Ji 


if  k  <  i  or  k  >  i  +  1 


Proof:  Let  ■  n(J^)  +  i  -  K(J,1). 

Then  N(J,S)  -  mgx 

Similarly,  let  Tj  ■  n(J|)  +  i  -  K(J',i) 
so  N<J',S)  -  max  Tj 
Suppose  nCJ^)  <  n(J^+1) 

Then  n(J1+1)  >  0,  and  either  n(Jj)  -  0  or  n(Jj)  >  0. 
Case:  n(Ji)  -  0. 

Then  K(J,i)  -  K(J,i+1)  -  K(J',i+1) 
while  K<J',i)  -  K(J,i)  -  1. 

Ti+1  "  n(Ji+1>  +  1  +  1  +  K<J»i+1> 

Ti  -  n<J1+1)  +  1 

-  n(J1+1)  +  i  +  K(J,1+1)-1 

I 

Therefore,  Tj  <  T^+^ 

Also, 

T'+1  -  n(Jt)  +  i  +  1  +  K(J',i+l) 

-  n(Jt)  +  i  +  1  +  K(J,i+1 ). 

Therefore,  Tj+1  <  T1+1,  when 

nOj)  <  n(J1+1) . 

1 
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Case:  0  <  n(J1)  <  n(J1+1>. 

Then  K(J,i)  -  K(Jfi+1)  -  K(J',1)  -  K<J',i+1) 


-  n<J1+1)  +  i  +  1  +  K(J,i+1) 
T'  -  n(J1+1)  +  1  +  K(J',1) 


1+1 

ri 


n(J1+1)  +  1 


+  K(J,1+1). 

Therefore,  Tj  <  T^. 

Ti+1  "  n(,V  +  1  +  1  +  K<J,.i+1> 

-  nOj)  +  1  +  1  +  K(J,i+1) 

But  n(J^)  <  n(J^+p  by  hypothesis, 

80  Ti+i  <  Ti+r 

For  all  k  <  l  or  k  >  i+1 , 

Tk-Tk 

Therefore,  we  have  shown  that  for  all  x,  there 
1*  a  k  such  that 

T.  >  T' 
k  —  x 


In  particular,  for  that  X  which  maximizes  T^,  there 
is  a  k  such  that 

Tfc  >  T^  -  N(J\S) 

Since  N(J,S)  >  T^, 

N(J,S)  >  N(J',S). 


arvffw-w-ivr-i... . 
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Thus  far  we  have  shown  how  to  calculate  the  number  of  arrays  needed 
both  to  compute  each  input  to  an  algorithm,  and  to  hold  those  inputs 
simultaneously  just  before  the  algorithm  executes.  We  have  not,  however, 
shown  how  many  are  needed  to  complete  the  algorithm's  execution,  including 
computation  of  its  result.  This  number,  n(S),  depends  on  whether  the 
algorithm's  result  may  overstore  a  fringe-set  array,  or  not.  We  claim: 

N(S)  -  N(S)  if  the  result  may  occupy  an  array  holding  one  of 
the  algorithm's  inputs. 

m  N(S  U  I)  otherwise. 

Here,  I  is  a  node-name  distinct  from  all  the  names  of  nodes  of  S, 
and  such  that  n(I)  >1. 

When  the  result  may  not  overstore  an  array  of  the  alg-tree's 
fringe-set,  we  must  account  for  the  possibility  that  the  members  of 
die  fringe-set  each  require  more  than  one  array  in  their  computation, 
but  of  course  only  one  to  hold  their  result.  Let  the  non-zero-meabers- 
of-S  be  the  set  of  members  x  of  S  such  that  n(x)  >  0.  Then 

n(S)  -  max  (N(S),  number  of  non-zero-members-of-S  +  1). 

But  n(S)  ■  N(S  U  I)  computes  precisely  this. 

We  csn  say  then,  that 

n(S)  -  N(S)  if:  (1)  the  root-shape  of  the  alg-tree  whose 

fringe-set  is  S  matches  the  leaf-shape  of 
some  node  x  in  S,  and 
(2)  x  is  not  a  leaf  of  E.^ 
and  n(S)  -  N(S  U  I)  otherwise. 


The  second  condition  if.  required,  for  we  csnnot  change  the  value 
of  any  variable  of  the  expression  in  computing  that  expression. 


Property  1  of  an  alg-tree  enables  us  to  avoid  some  of  the  redun¬ 
dant  alg-tree  growing  that  simple  application  of  our  minimisation 
method  would  require.  We  can  record,  for  each  shape  S  at  each  node, 
all  the  distinct  alg-trees  with  shape  S.  We  can  then  generate  these 
sets  for  any  node  x,  given  that  they  have  been  generated  and  recorded 
for  all  descendants  of  x. 

To  generate  all  members  of  G(S,x),  where  x  is  not  a  leaf  of  B: 

(1)  Choose  an  EEPT  e,  whose  root-shape  equals  S,  which  matches 
E  at  x.  Let  the  nodes  of  E  corresponding  (in  the  match) 
to  leaves  of  e  be  y^,  certain  descendant  nodes  of  x. 

Let  the  leaf-shape  of  each  be  s^. 

(2)  For  every  1,  choose  a  member  of  G(s^,y^).  Let  the  set 
of  alg-trees  so  chosen  be  {F^}. 

Extend  the  assignment  B  of  e  to  E  made  In  (1)  to  Include 
all  assignments  to  F..  B',  the  extension  of  B,  is  defined 

to  assign  to  each  none  x  assigned  an  EEPT  node  x  by  an 
F^  or  by  B  that  same  EEPT  node  x. 

(4)  Each  distinct  choice  in  step  (1)  or  (2)  generates  a  new 
alg-tree  in  (3).  Repeat  until  all  choices  are  made. 

(5)  Repeat  steps  1-4  for  each  distinct  shape  S. 

(6)  Steps  1-5  create  a  set  of  alg-trees  B^  rooted  at  x. 

Provide  for  the  possibility  that  x  may  be  in  the  frlngn- 
set  of  some  alg-trees  rooted  at  ancestors  y  of  x,  by 
computing  n(x),  an  integer  giving  the  minimum  number 

of  arrays  needed  to  compute  x  regardless  of  which  B^  is 
uai:d.  A  "null  alg-tree",  which  has  only  one  input,  x, 
having  n(x)  as  its  array  requirement,  must  then  be 
added  to  G(S,x)  for  each  S. 

When  x  is  a  leaf  of  E,  compute  G(S,x)  ■  the  null  alg-tree, 
with  a  single  member  in  its  fringe-set,  x.  n(x)  a  0;  for  no 
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intermediate  arrays  are  needed  In  computing  a  leaf  of  E.  Leaves 
of  E  are  assumed  to  be  computed  before  any  subexpression  of  E 
is  evaluated. 

Theorem:  If  no  node  in  the  parse-tree  E  has  more  than  one  line  incident 
on  it,  ve  can  verify  that  this  algorithm  computes  all,  and 
only,  the  members  of  G(S,x)  for  each  S. 

Proof;  To  show  that  each  object  produced  by  step  (3)  is  a  member  of 
G(S,x)  for  the  S  chosen  in  (1),  we  must  show  that  each  such 
object  is  an  alg-tree  rooted  at  x,  with  root-shape  S.  In 
step  (3)  we  extend  an  assignment  of  an  EEPT  e  to  x  to  other 
nodes  of  E.  We  must  show  that  the  resulting  assignment  is 
an  alg-tree.  Clearly,  by  the  construction  rule  for  alg-trees, 
e's  assignment  to  x  is  an  alg-tree,  T.  e's  1th  line-shape, 
s^,  is  assigned  a  line  incident  on  node  y^  of  E  by  T.  is 
an  alg-tree  rooted  at  y^  whose  root-shape  is  s^,  by  step  (2). 
This  means  that  the  EEPT  f^  whose  root  is  assigned  y^  by 
has  root-shape  s^.  Also,  f^  matches  E  at  y^.  Since  no 
node  of  E  has  more  than  one  line  incident  on  it,  y^'s  only 
incident  line  has  line-shape  s^.  All  lines  incident  on  y^ 
have  shape  s^,  and  lie  in  T.  Therefore  f^  is  parallel- 
attachable  to  T  at  y^,  and  hence  the  extension  of  T  to  include 
f^  is  an  alg-tree  rooted  at  x  with  root-shape  S.  Succeeding 
extensions  can  be  made  to  include  all  EEPT's  used  in  f^'s 
construction.  Similar  reasoning  shows  that  T  may  be  extended  u 
to  each  F^.  Hence,  each  object  produced  by  step  (3)  is  a 
member  of  G(S,x). 

To  see  that  all  members  of  G(S,x)  are  produced  by  the 
leaves-in  algorithm,  assume  the  contrary.  Then  there  is 
a  T  c  G(S,x)  not  produced  for  any  choices  of  e's  and  F^'s 
made  in  steps  (1)  and  (2).  T  differs  from  each  assignment 
produced  by  the  leaves-in  algorithm  in  at  least  one  node. 

T  must,  however,  assign  to  x  an  EEPT  which  matches  E  at 
x  and  has  root- shape  S.  Since  e  matches  E  at  x  in  only 
one  way,  each  node  of  E  assigned  by  T  to  nodes  of  e  matching 
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E  at  x  is  assigned  that  same  node  by  one  of  the  choices  of 
(1).  In  particular,  the  leaves  of  e  are  assigned  the  same 
nodes  y^  and  shapes  s^  In  T  as  In  some  choice  of  EEPT  made 
by  (1).  Thus,  T  e  H(x,s^,y^)  for  each  1.  Therefore  for  each 
1  there  Is  an  e  G(s^,y^)  which  Is  a  sub-alg-tx.ee  of  T. 

Seme  choice  made  by  (2)  selects  precisely  these  F^'s  for 
every  1.  Then  T  cannot  differ  from  this  selection  on  any 
node  In  F^.  But  T  assigns  only  nodes  of  E  In  an  F^,  or 
which  match  nodes  of  e.  Thus,  there  Is  one  selection  of 
choices  In  steps  (1)  and  (2)  from  vfalch  T  cannot  differ. 

In  performing  step  (6)  of  the  leaves-ln  algorithm,  we  find  that 
we  must  compute  the  minimum  number  of  arrays  needed  to  compute  the 
result  of  a  node  x.  This  In  turn  requires  an  evaluation  of  the  number 
of  arrays  needed  for  the  execution  of  each  alg-tree  rooted  at  x.  These 
numbers  depend  on  the  array  requirements  of  the  nodes  In  each  alg-tree's 
fringe-set.  We  will  speak  of  "the  value  of"  an  alg-tree  A,  or  a  node 
x,  when  we  mean  the  minimum  number  of  arrays  needed  for  the  execution 
of  A  to  yield  x's  result,  or  for  the  computation  of  x's  result  by  the 
alg-tree  rooted  at  x  whose  value  Is  least. 

If  we  were  to  represent  alg-trees  as  EEPT-node  Identifiers  attached 
to  certain  nodes  of  E,  linked  together  in  some  way,  each  time  we  needed 
to  compute  an  alg-tree's  value,  that  alg-tree's  fringe-set  would  have 
to  be  obtained.  A  better  representation  for  alg-trees  avoids  much  of 
this  computation.  We  could  represent  alg-trees  by  their  fringe-sets. 

In  order  for  this  change  of  representation  to  save  computation,  we  would 
like  to  show  that  operations  analogous  to  the  steps  of  the  leaves-ln 
algorithm  can  be  performed  on  fringe-sets,  to  yield  new  fringe-set  repre¬ 
sentations  directly.  But  only  step  (3),  the  step  which  produces  a  new 
alg-tree  depends  on  alg-tree  representations.  If  each  of  the  Input 
to  step  (3)  were  represented  tvs  alg-tree  fringe-sets,  step  (3)  could 
produce  the  fringe-set  of  the  extension  F  of  e  to  the  F^  by  set-uniting 
all  the  F^  fringe-sets.  [Each  node  in  the  fringe-set  of  F  must  have 
been  a  fringe-set  node  of  some  F^.  Similarly,  each  node  in  the  fringe- 
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set  of  some  is  in  the  fringe-set  of  F.]  The  change  of  representation 
is  thus  desirable. 

A  still  more  desirable  representation  of  alg-trees  presents  Itself. 

In  computing  the  value  of  each  alg-tree,  we  apply  the  function  n  to  the 

alg-tree's  fringe-set,  S,  yielding  n(S).  n(S)  ultimately  requires  the 

evaluation  of  N(S)  (or  N(S  U  1)).  Recall  that  a  fringe-set  S  is  a 

certain  collection  of  nodes  in  a  graph.  In  evaluating  N(S),  the  names 

of  the  nodes  in  S  are  irrelevant.  N(S)  requires  only  n(S^),  the  values 

of  the  nodes  in  S,  for  its  computation.  This  suggests  that  each 

fringe-set,  represented  as  a  list  of  integer  node-names  frj , . . . ,ym)be 

associated  with  the  fv-set  (n(y, ) ,  . . . ,n(y  )).  The  fv  (fringe-value) 

— —  1  m 

set  we  will  represent  as  a  string  of  integers  separated  by  spaces.  Each 
integer  n(y^)  is  the  value  of  some  node  y^  in  the  associated  fringe-set 
and  is  the  number  of  arrays  needed  in  computing  y  by  that  algorithm 
which  uses  fewest  arrays. 

We  will  extend  the  functions  N(S)  and  n(S)  to  apply  to  fv-sets. 

If  S'  is  the  fv-set  associated  with  S,  with  Sj  being  the  integers  in  S', 

N(S')  -  max  (S'  +  1-1) 

Ki 

SJ>0 

where  S|  for  all  S^,  3|+1  in  S' 

Thus,  N(S')  =  N(S) (cf.  the  definition  of  N(S)). 


Each  alg-tree  can  be  represented  by  its  fv-set  during  the  leaves-in 
algorithm.  We  must  describe  once  more  how  step  (3)  of  the  leaves-in 
algorithm  can  be  modified  to  accomodate  the  new  representation.  So  long 
as  there  is  no  node  y  of  E  with  more  than  one  line  incident  on  y  (an 
assumption  which  the  leaves-in  algorithm  requires  in  any  case),  no 
two  fringe-sets  united  by  the  modified  step  (3)  have  nodes  in  common. 

For  two  fringe-sets  united  by  step  (3)  must  Include  descendants  of  two 


distinct  nodes  y^  and  y  only.  Furthermore,  y^  is  neither  a  descendant 
nor  ancestor  of  y^,  for  both  are  members  of  the  fringe-set  of  an  alg- 
tree,  and  nodes  of  a  fringe-set  of  an  alg-tree  have  no  descendants  in 


that  alg-tree,  and  hence  in  that  fringe-set.  The  lack  of  nodes  in  the 
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tree  with  more  than  one  incident  line  implies  that  the  set  of  descen¬ 
dants  of  y^  is  disjoint  from  the  set  of  descendants  of  yj.  Hence, 
the  fringe-sets  united  by  step  (3)  include  no  nodes  in  common. 


The  fact  that  two  fringe-sets  united  by  the  modified  step  (3) 
have  no  nodes  in  common  suggests  a  simple  extension  of  fringe-set 
set-uniting  to  the  fv-sets  of  those  fringe-sets.  We  can  define  the 
"join"  of  two  fv-sets  A  =  (A^,...,/.^)  and  B  =  (B^,...,Br)  to  be 
(Aj,...,Am,  Bj , . . . ,B^) .  That  is,  the  join  of  A  and  B,  written  A  (J  B, 
is  a  set  of  integers  consisting  of  every  integer  appearing  in  A  or  in 
B.  The  number  of  integers  in  the  join  is  the  sum  of  the  number  of 
integers  in  A  and  the  number  in  B.  Step  (3)  of  the  leaves-in  algorithm 
can  be  modified  to  read: 


(3')  Join  the  fv-set  representation  of  all  the  F^  to 

produce  the  fv-set  representation  of  a  new  alg-tree. 

Further  information  will  be  associated  with  each  fv-set  computed, 
in  objects  called  the  "tags"  of  each  fv-set.  Tags  explicitly  represent 
alg-trees,  which,  although  not  needed  during  the  leaves-in  algorithm, 
nonetheless  must  be  recoverable,  for  the  alg-trees  constitute  the 
desired  output  of  the  procedure.  Alg-trees  will  be  explicitly  repre¬ 
sented  by  linking  each  fv-set  F  produced  with  the  fv-sets  F^  joined 
In  step  (3')  to  form  F.  Tags  will  also  hold  additional  information 
associated  with  each  fv-set,  notably  the  root-EEPT  number,  and  the 
fringe -shape- set.  Except  for  the  fringe-shape- sets,  none  of  the  in¬ 
formation  in  tags  is  essential  in  computing  fv-sets. 

The  root-EEPT  of  an  alg-tree  A  is  the  EEPT  assigned  by  the  alg- 
tree  to  the  node  at  which  A  is  rooted.  Each  fv-set  when  generated  is 
placed  in  one  of  the  sets  G(S,x).  Srch  sets  are  termed  shape-sets.  At 
each  node  one  shape-set  is  produced  for  each  distinct  EEPT  root-shape. 
Shape-sets  represent  Information  about  the  fv-sets  they  contain,  infor¬ 
mation  which  is  needed  by  the  leaves-in  algorithm. 

Fringe - shape -sets  enable  step  (6)  to  compute  n(S)  from  N(S),  where 
is  an  fv-set.  Recall  that 
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n(S)  a  N(S)  if  some  node  x  in  the  fringe-set  S  satisifies: 

(1)  the  leaf-shape  of  x  matches  the  root-shape  of 
S's  alg-tree;  and 

(2)  x  is  not  a  leaf  of  E. 

n(S)  a  N(S  (j  I)  otherwise. 

Fringe-shape-sets  record,  associated  with  each  fv-set,  sufficient  infor¬ 
mation  to  allow  these  two  cases  to  be  distinguished.  A  fringe-shape-set 
F(S),  where  S  is  its  associated  fv-set,  is  a  subset  of  all  possible  shape 
names.  Presence  of  a  shape  name  t  in  F(S)  records  the  presence,  in  S's 
fringe-set,  of  some  node  x  satsifying  (2),  whose  leaf-shape  equals  t. 
Suppose  C  is  an  fv-set  belonging  to  shape-set  G(S,x).  Then,  if  F(C)  is 
C ' 8  fringe-set,  the  proper  evaluation  of  n(C)  can  be  determined: 

n(C)  -  N(C)  if  S  €  F(C) , 

n(C)  m  N(C  U  1)  otherwise. 

Here  C  U  1  is  an  fv-set  formed  by  joining  an  extra  integer  1  to  C. 

The  Introduction  of  a  fictitious  node  I  to  achieve  the  proper  value  is 
no  longer  necessary. 

Fringe-shape-sets  must  be  computed  along  with  fv-sets.  Thus,  again 
using  F(C)  to  denote  the  fringe-shape-set  of  fv-set  C,  if  are  the 
fv-sets  to  be  joined  in  step  3',  to  form  fv-set  G,  compute  as  well 

F(G)  -  F(U  G.)  =  U  F(G. ) 

i  1  i  1 

Here  F(A)  U  F(B)  is  Just  the  set  union  of  F(A)  and  F(B).  Clearly,  if 
shape-name  t  occurs  in,  say  F(A),  it  means  that  some  node  1  of  A's 
fringe-set  has  leaf-shape  t.  In  the  fusion,  I  will  retain  leaf-shape 
t.  Hence,  t  should  occur  in  F(A  U  E).  Furthermore,  if  t  occurs  in 
neither  F(A)  nor  F(B),  no  member  of  the  fringe-set  of  A  U  B  will  have 
leaf-shape  t  (for  no  nodes  other  than  those  in  the  fringe-set  of  A 
or  the  fringe-set  of  B  enter  the  fringe-set  of  A  U  B),  and  hence  t  must 
not  be  in  F(A  U  B).  Occasionally,  we  will  need  to  represent  fringe-shape- 
sets  explicitly,  as  in  examples.  We  will  represent  (on  paper)  a  fringe- 
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shape-set  F(C)  by  listing,  immediately  after  fv-set  C,  the  shape- 
abbreviations  contained  in  F(C).  Thus,  if 

C  is  an  fv-set  containing  the  Integers  1,1, 2, 3, 
and  F(C)  contains  the  shape  r,c, 

we  represent  C  and  F(C)  together  as: 

(1  1  2  3)  r  c 

To  complete  the  computation  of  fringe-shape-sets,  we  must  produce 
a  representation  for  the  "null  alg-tree",  added  in  Step  6  to  each 
shape-set  G(S,x).  The  only  nodes  of  E  assigned  by  the  null  alg-tree  are 
in  its  fringe-set.  This  fringe-set  consists  of  a  single  node,  x.  The 
fv-set  representation  of  this  alg-tree  would  therefore  by  FN  b  (n(x)). 

We  can  compute  Z  =  min  n(C^),  where  each  is  an  fv-set  generated  in 
shape-set  G(S,x)  for  some  S  (including  S  =  fl).  Each  represents  a 
method  for  computing  x.  That  among  the  which  requires  fewest  arrays 
for  x's  computation  is  chosen.  The  number  of  arrays  requires  is  n(Cj)— 
hence  the  computation  of  Z  gives  n(x).  The  links  of  FN,  which  will  repre¬ 
sent  the  alg-tree  to  be  used  to  compute  x,  should  be  copies  of  the  links 
of  Cj.  We  argue  that  the  fringe -shape -set  of  the  copy  of  (n(x))  added 
to  G(S,x)  should  be  S.  For  FN  in  reality  represents  the  computation  of 
x  into  an  intermediate  array.  When  FN  is  added  to  G(S,x),  any  fusion 
with  this  copy  of  FN  will,  by  step  (2)  of  the  leaves-ln  algorithm,  access 
this  array  by  shape  S.  Hence,  the  fringe-set  of  FN  contains  a  node,  x, 
which  is  not  a  leaf  of  E,  and  whose  leaf-shape  is  S.  Therefore  F(FN)  =  S, 
since  x  is  the  only  node  in  the  fringe-set  of  FN. 

(hn  more  refinement,  of  our  algorithm  can  be  Introduced.  Step  (2) 
of  the  leaves-in  algorithm  requires  us  to  select  a  member  of  S(s^,y^) 
for  each  leaf  i  of  EEPT  e.  But  what  if  =  n?  Such  a  shape  is  not 
defined  to  'Vnatch"  another  of  the  same  name.  We  resolve  this  difficulty 
simply.  We  arrange  that  G(si,yi)  contain  only  fv-set  (nCy^)).  This 
fv-set  represents  the  null  alg-tree,  with  which  an  alg-tree  whose  leaf- 
shape  is  (1  at  y^  may  "fuse"'.  Furthermore,  since  in  reality  y^  will 
belong  to  the  fringe-set  of  the  fusion,  treating  the  fv-set  (n(y^)) 
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consistently  as  an  fv-set  introduces  the  integer  n(y^)  into  each  fv-set 
F  for  which  y^  belongs  to  F's  fringe-set,  and  to  no  others.  We  accom¬ 
plish  this  by  adding  Step  (7)  to  the  leaves-in  algorithm. 

(7)  Replace  G(ft,x)  with  the  null  alg-tree  rooted 
at  x.  The  representation  of  this  alg-tree  is 
the  fv-set 

FN  o  (n(x)) 

F(FN)  is  empty 

Hie  handling  of  the  leaves  L  of  E  is  straightforward.  Clearly  only 
the  null  alg-tree  matches  a  leaf  of  E.  Hence,  each  shape-set  G(S,L) 
contains  only  FN  a  (0),  for  n(L)  =  0,  because  no  intermediate  arrays  are 
required  to  compute  a  leaf.  Furthermore,  because  L  is  a  leaf  of  E, 

F(FN)  is  empty  in  each  shape-set,  for  the  only  node  in  the  fringe-set 
of  FN  is  L  which  is  a  leaf  of  E.  Hence,  no  shapes  occur  in  the  fringe- 
shape-set  of  any  copy  of  FN. 

From  now  on  we  will  discuss  tagged  fv-sets  and  the  alg-trees  they 
represent  interchangeably.  Each  of  the  terms  here  defined  for  alg-trees 
can  be  extended  to  apply  to  the  tagged  fv-sets  representing  alg-trees. 
Thus,  we  will  speak  of  the  root-shape  of  fv-set  G,  meaning  the  root- 
shape  of  the  alg-tree  G'  whose  fv-set  is  G,  and  which  the  tags  of  G 
represent,  etc. 

III. 7  Effort  Estimates  Motivating  Search  Reduction 

It  is  worthwhile  at  this  point  to  make  an  estimate  of  the  number 
of  alg-trees  we  must  consider.  The  time  we  spend  in  optimizing  an 
expression  is  likely  to  be  directly  related  to  this  number. 

One  of  our  early  formulations  of  this  problem  suggested  that  a 
very  large  number  of  alg-trees  would  have  to  be  considered.  We  supposed 
that  a  maximal  alg-cree  rooted  at  a  node  x  was  given.  The  leaves  of 
such  an  alg-tree  either  coincide  with  leaves  of  E,  or  have  null  leaf- 
shape,  so  that  no  EEPT  can  be  parallel  attached  to  them.  We  then  ob¬ 
served  that  each  "pruning"  of  a  branch  of  this  alg-tree  resulted  in  a 
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new  alg-tree.  We  can  calculate  the  number  of  such  alg- trees  derivable 
by  such  branch  paring. 

Let  C(X)  be  the  number  of  alg-trees  derivable  by  branch-paring 
from  a  given  alg-tree  rooted  at  x. 

Each  pruning  of  a  descendant  of  x  can  be  combined  with  the 
prunings  of  any  other  descendant  of  x  to  yield  distinct  alg- 
trees.  We  get 

C(x)  »  C(Xl)  *  C(x2)  +  1 

wheie  Xj  and  are  the  imnediate  descendants  of  x.  The  "1" 
is  added  to  account  for  the  alg-tree  resulting  when  all  nodes 
but  x  are  pruned  away.  Mien  x  is  a  leaf,  it  has  no  descendants, 
so  C(leaf)  «=  1. 

This  function  suggests  a  rather  large  number  of  possibilities.  In  a 
binary,  symmetric  alg-tree,  its  value  is  greater  than  JlF ,  where  n 
is  the  number  of  non-leaf  nodes  in  the  tree.  This  motivated  us  to 
search  for  strategies  which  reduced  the  cost  of  searching  this  "tree- 
pruning"  space. 

The  exponential  nature  of  the  dependence  was  based  on  the  "indepen¬ 
dence"  of  the  operation  on  each  branch.  Each  branch  must  be  pruned  in 
all  possible  combinations  with  the  prunings  of  other  branches.  We 
searched  for  a  method  which  would  decide  how  short  any  one  branch  should 
be,  regardless  of  the  remaining  branches. 

The  fringe-sets  a  branch  B  gives  rise  to  are  ultimately  set-united 
with  fringe-sets  arising  from  other  branches.  We  hoped  to  avoid  gen¬ 
erating  these  "other  branch"  fringe-sets.  We  therefore  searched  for  a 
criterion  which  would  allow  two  interchangeable  fv  sets,  A  and  B,  both 
united  with  the  same  externally  generated  fv-set,  C,  to  be  compared. 
Specifically  we  need: 

N(A  U  C)  >  N(B  U  C)  for  all  C. 

Such  a  criterion  was  discovered.  It  promises  to  drastically  reduce 
the  number  of  fv-sets  we  need  consider  in  each  shape-set,  by  allowing 
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us  to  discard  sets  like  A,  which  are  known  to  be  no  better  than  a 
set,  B,  which  we  retain. 


a  maximal  alg-tree. 
All  its  primings  are: 


A 
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We  compute  C(d),  where  d  Is  the  distance  from  the  leaves  of  a 
node  x  in  a  binary  symnetric  tree. 

(C(d)  -  C(x)). 

We  get 

C(d)  o  [C(d-l)]2  +  1 
C(0)  -  1 

In  contrast,  the  number  of  intermediate  nodes  in  a  binary  symmetric 
tree  of  height  d  is  /((d),  where 

/1(d)  «  2/l(d-1)  +  1 

/i(0)  -  0 

Here,  the  height  of  a  symmetric  tree  is  the  distance  from  its 
root  to  any  of  its  leaves;  intermediate  nodes  of  a  tree  are  non- leaf 
nodes.  The  number  rf  operators  in  an  expression  equals  the  mmber  of 
intermediate  nodes  of  that  expression's  parse-tree. 

/((d)  d  C(d) 

0  0  1 

1  1  2 

3  2  5  (Our  example) 

7  3  26  -  52  +  1 

15  4  677  -  262  +  1 

We  will  show  that 

C(d)  >  2**[/((d-l )  +1]  for  all  d  >  2 

and,  since 

/((d-1)  +  1  »  1  +  [/((d)  -  l]/2  -  [*(d)  +  1  ]/ 2 

that 

C(d)  >  2**[/f(d)/2j  -  C/2  )  **/l(d) 
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Proof:  by  Induction  on  d. 

Case:  <1  **  2. 

C(2)  o  5.  jrf(d-l)  -  (((1)  b  1.  2** (^(d-1 )  +  1)  -  2**2  -  4 

C(d)  «  C(2)  -  5  >  4  o  2**[/f<1)  +  1]  -  2**[/f(d-1)  +  1] 

Case:  d  >  2.  Assume  the  conclusion  for  d  -  1. 

C(d)  -  [C(d-1)]2  +  1  >  [C(d-l)]2  >  [2**(/f(d-2)  +  l)]2 

C(d)  >  [2**</l<d-2)  +  l)]2  -  2**[2^(d-2)  +  2] 

2/l(d-2)  +  2  -  [ 2/J (d— 2 )  +  1]  +  1  «  /!(d-1)  +  1 
therefore 

C(d)  >  2**[f((d-1 )  +  1] 
holds  for  all  d  >  2. 

Thus,  if  v  b  the  number  of  operators  in  an  expression,  we  will  need 
to  investigate  somewhat  more  than 

2**[<vH)/2] 

tree  primings. 


The  comparison  technique  motivated  by  the  "maximal  alg-tree"  al¬ 
gorithm  can  be  profitably  applied  to  the  "leaves-ln"  algorithm.  The 
effort  required  by  the  leaves-in  algorithm  is  very  similar  to  that 
required  by  the  maximal  alg-tree  algorithm.  It  can  be  calculated  as 
follows: 

Let  D(x,S)  »  the  number  of  alternative  alg-trees  rooted 
at  x  with  root-shape  S. 

Then  D(leaf,S)  «  1,  for  each  root-shape  S. 

At  a  non- leaf  node,  x,  this  number  depends  on  the  number  of 
EEPT's  with  root-shape  S,  as  well  as  the  nunber  of  alg-trees  in  shape- 
set  of  deecendant  node  x^. 

When  we  choose  EEPT  K  which  matches  E  at  x,  a  (shape,  node)  pair 
is  determined  for  each  leaf  of  K  rooted  at  x.  Let  these  pairs  be 


(SK1,XK1),,,*»(SKJ,XKJ) 
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The  number  of  combinations,  each  representing  a  possible 
alg-tree  rooted  at  x  choo  sable  in  this  way  is: 


Thus 


n  do^.8^) 


D(X,S)  -E  11  D(XKi»SKi)  +  1 


KcL  i 

where  L  Indexes  the  EEPT's  with  root-shape  at  S  matchable  to 


at  X. 


The  term  arises  from  the  need  to  consider  n(x)  a  member  of 
each  shape-set,  representing  the  computation  of  on  array  holding  the 
result  of  x. 


Let  us  assume  that  each  EE FT  K  is  a  single-operator  binary 
tree,  so  that  its  two  leaves  coincide  with  the  sons  of  x  when 
the  EEPT  is  rooted  at  x. 

Let  us  further  assume  that,  for  each  shape  S,  there  exists  only 
one  EEPT  having  a  root-shape  equal  to  S. 

Then  D(X,S)  «.  *  D(X2,SR2)  +  1 

where  X  and  X2  are  the  sons  of  X. 

Assuming  that  the  tree  is  symmetric,  and  that 

D(X,S1)  -  D(X,Sj)  for  all  shapes 

we  h'/e:  D(X,S)  -  D^.S)  *  D(X2,S)  +  1, 

D(leaf,S)  =  1 

or,  in  a  aynmetric  tree  containing  W  intermediate  nodes 
D(X,S)  >  2**(w/2) 


CHAPTER  IV 


Introduction 


Chapter  III  demonstrated  that  the  cost  of  choosing  an  optimum 
compilation  of  a  given  matrix  arithmetic  expression  appears  to  grow 
exponentially  with  the  size  of  the  expression.  The  present  chapter  is 
devoted  to  demonstrating  a  result  which  reduces  this  exponential 
dependence  on  the  expression-size  to  linear  dependence.  The  result, 
called  the  "comparison  theorem",  allows  two  interchangeable  fv-sets 
(and  hence,  the  alg-trees  they  represent)  to  be  "compared". 

Each  alg-tree  rooted  at  x  was  retained,  in  the  leaves-in  algorithm, 
to  allow  it  to  become  a  part  of  a  "larger"  alg-tree.  The  number  of 
arrays  this  larger  alg-tree  requires  depends  not  only  on  the  alg-tree 
rooted  at  x,  from  which  it  was  generated,  but  on  the  alg-trees  rooted 
outside  x  which  are  also  part  of  the  larger  alg-tree. 

Suppose  alg-trees  A  and  B  are  both  members  of  G(S,x).  Then  when¬ 
ever  A  can  be  parallel  connected  to  some  EEPT  which  matches  E  at  y,  some 
ancestor  node  of  x,  so  can  B.  The  number  of  arrays  needed  in  computing 
y  via  A,  together  with  some  alg-trees  C  rooted  at  other  descendants  of 
y  than  x  is  n(A  U  C),  N(A  U  C)  is  a  major  component  of  n(A  U  C).  The 
comparison  theorem  is  capable  of  deciding,  by  examining  only  A  and  B, 
whether 

N(A  U  C)  <  N(B  U  C) 

without  generating  all  the  possible  alg-trees  C  rooted  outside  x  with 
which  A  and  B  might  fuse.  The  comparison  theorem  itself  gives  neces¬ 
sary  and  sufficient  conditions  on  A  and  B  for  the  statement: 

(1)  for  all  C  N(A  U  C)  <  N(B  U  C) 
to  hold.  These  conditions  are  in'spendent  of  C.^ 

^The  trick  of  generalizing  over  a  variable  to  derive  conditions 
Independent  of  that  variable  may  work  for  other  comparison  predicates. 

This  would  suggest  its  use  in  exhaustive-searches.  Sufficient  "structure" 
must  exist  in  the  space  being  searched  to  allow  a  concept  analogous  to 
"Interchangeable  fringe-.'rets"  to  exist.  Also,  the  comparison  theorem  in 
other  searches  may  lack  power  (perhaps  only  holding  between  identical  par¬ 
tial  states)  or  applicability  (perhaps  few  comparable  pairs  are  ever  pro¬ 
duced).  However,  when  proper  conditions  hold,  it  appears  to  be  a  powerful 
search-space  reducing  operator. 
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While  the  quantification  makes  this  predicate  Independent  of  C, 
its  determination  would  be  too  time  consuming  if  every  C  had  to  be 
generated  before  the  predicate  could  be  computed.  Thus,  we  seek  a 
new  predicate  equivalent  to  (1),  which  does  not  specifically  mention  C. 

In  order  to  derive  a  predicate  equivalent  to  (1),  but  not  involving 

C,  we  investigate  in  some  detail  the  evaluation  rule  for  N(S),  where  S 

is  an  fv-set.  S  is  a  set  of  integers  (possibly  including  repetitions), 

S  .  We  have  discovered  that  the  function  N(S)  =  max  (S  +  i  -  1),  where 

Ki 

st>o 

N(S)  is  then  the  maximum  of  a  set  of  terms.  g(S,l),  where 

g(S,i)  d  Sj  +  i  ■  1,  1  <  i  A  >  0,  Not  all  these  terms  contribute 

directly  to  tlie  maximum.  Some,  where  S[i^]  a  Sf^],  merely  act  as  place¬ 
holders,  increasing  the  value  of  the  index,  1,  but  are  themselves  smaller 
than  another  term.  In  other  words, 

if  Sj  «  SJ+1,  then  g(S, j)  b  g(S,j+1)  -  1  <  g(S,J+l). 

Hence,  mjx(g(S,i))  >  g(S,j). 

Furthermore,  the  side  condition  requiring  is  not  simmarized 

in  the  term  function,  g,  and  must  be  handled  separately. 

We  introduce  a  different  method  of  computing  N(S),  which  eliminates 
both  the  need  for  the  side  condition,  and  emphasizes  the 

"important  terms"  of  the  g(S,i),  i.e.,  those  which  may  contribute  to 
the  maximum.  They  are  characterized  by  >  S^.  The  new  method  of 
computing  N(S)  makes  use  of  a  new  set  of  terms,  f(S,v)  b  I(S,v)  +  v  -  1. 
Rather  than  an  index,  v  is  a  "value",  an  integer  which  may  be  found 
in  the  set  S.  I(S,v)  t.  the  number  of  elements  >  v.  I(S,v)  incor¬ 
porates  the  properties  of  the  ordering  condition  It  fur¬ 

thermore  serves  to  give  the  index  of  the  "important  term"  i  whose 
a  v.  Its  extensions  to  values  v  not  occurring  in  S  introduces  new 
"unimportanc"  terms  (where  >  v  >  S^),  but  remains  manageable.  By 
making  the  definition  of  f(S,v)  conditional  on  v  >  0  and  I(S,v)  >  0, 
so  that  f(S,v)  b  0  where  these  conditions  fail  to  hold,  N(S)  can  be  com¬ 
puted  by  unrestrictedly  maximizing  f(S,v)  over  v. 
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Once  we  have  decided  to  use  the  sequences  f(S,v)  for  the  computation 
of  N(S)t  we  can  discuss  the  effect  on  N(S)  of  joining  another  set  Y  to 
S.  Suppose  Y  is  a  set  consisting  of  m  copies  of  the  integer  x.  Then 


f(S  u  Y,v)  J*(S*V>  lf  V  >  X- 

(f(S,v)  +  m  if  v  <  x. 

In  other  words ,  joining  a  new  set  Y  to  S  increases  terms  with  low  enough 
values  by  a  constant  amount.  Except  for  cases  wuen  x  >  for  all  S^,  no 
new  important  terms  are  introduced  b'  the  join.  When  Y  is  adjoined  to 
two  sets,  S  and  T,  we  again  have 

f(S  U  Y,v)  b  f(S  l  Y,v)  +  m  if  v  <  x 

and  f (T  U  Y,v)  o  f(T  L  Y,v)  +  m  if  v  <  x 

Thus,  lf  two  sets  S  and  T  compare  so  that  N(S)  >  N(T),  it  may  happen  that 

for  some  sufficiently  small  v,  vQ 


f(S,vQ)  <  f(T,vQ). 

Then,  by  adjoining  some  Y  to  both  S  and  T,  we  can  Increase  the  values  of 
the  terms  generated  by  vQ  in  the  new  sets: 

f (S  U  Y,vq)  b  f (S,vQ)  +  m 

f(T  U  Y,vQ)  =.  f(T,vQ)  +  m 


If  the  value  u  which  minimizes  f(S,u)  is  larger  than  Vq,  then  the  terms 
generated  by  v^  will,  for  large  enough  m,  be  larger  than  f(S,u).  When 
this  happens,  we  will  have 

N(S  U  Y)  <  N(T  U  Y)  for  that  Y. 


Example: 


S  »  (41).  T  «  (212). 


We  must  reorder  S  and  T: 


Thus,  N(S) 


S'  b  (41),  T'  =  (221) 


i  g(S,i)  g(T,i) 

1  4  2 

2  2  3 

3  3 


4,  N(T)  b  3,  so  N(S)  >  N(T) . 
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Suppose  we  adjoin  Y  »  (11)  to  both  S  and  T  and  repeat  the  process 
of  evaluation. 

S  U  Y  =  (4111),  T  U  Y  o  (21211) 

(S  U  Y)'  .  (4111),  (T  U  Y)'  =  (22111) 
i  g(S  U  Y,i)  g(T  U  Y,i) 


1  4  2 

2  2  3 

3  3  3 

4  4  4 

5  5 


Now,  N(S  U  Y)  b  4,  and  N(T  U  Y)  =  5.  Thus  Y  has  reversed  the  S  -  T 
comparison. 

In  terms  of  the  f(S,v)  and  f(T,v)  representation,  we  have: 

v  f (S,v)  f (T,v)  f.S  U  Y,v)  f (T  U  Y,v) 

5  0  0  0  0 

4  4  0  4  0 

3  3  0  3  0 

2  2  3  2  3 

1  2  3  4  5 

Notice  that  f(S,1)  =  2,  while  f(T,1)  =3.  So  long  as  there  is  no 
v  <  1  for  which  f(S,v)  >  f (T, 1 )  (as  there  is  not  in  this  example) 
increasing  f(S,1)  and  f(T,1)  by  a  sufficiently  large  m  will  Insure 
that  N(T)  >  N(S) . 

The  following  section  derives  more  formally  the  prsdicate  equiva¬ 
lent  to  (1)  which  does  not  refer  explicitly  to  C.  This  predicate  is 
abbreviated  A  <  B  (or  B  >  A).  B  >  A  just  when,  for  each  integer  w  >  0, 
there  is  an  Integer  v,  satisfying  w  >  v  >  0,  such  that  f(B,v)  >  f(A,w). 

Following  the  proofs  of  the  equivalence  of  B  >  A  and  (1),  a  sec¬ 
tion  detailing  the  application  of  the  comparison  theorem  to  the  leaves- 
in  algorithm  is  presented.  Here,  we  describe  the  "interchangeability" 
requirement  conditions,  as  well  as  discussing  the  theorem's  applicability 
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Co  comparisons  of  n(B  U  C)  Co  n(A  U  C).  IC  Is  shown  ChaC  boCh  fv-sets 
musC  be  members  of  Che  same  shape-set,  as  well  as  saClsfying  an  inclu¬ 
sion  condiCion  on  Cheir  fringe-shape-sets  before  one  of  Che  fv-seCs  can 
be  discarded. 

The  remaining  secCion  of  Chis  chapCer  shows  Che  power  of  Che  com¬ 
parison  Cheorem.  Shape-sets  produced  during  Che  leaves-in  algorithm 

3  3 

from  EEFT's  associated  with  Che  n  and  n  /2  basic  algorithms  have  certain 
particularly  useful  properties.  These  properties  allow  advance  prediction 
of  the  outcome  of  many  fv-set  comparisons  between  members  of  the  same 
shape-set.  Because  the  initial  EEPT's  satisfy  an  "equality"  condition 
on  their  root  and  leaf-shapes,  we  can  show  that  only  two  fringe-shape-set 
categories  of  fv-sets  occur  in  each  shape-set.  Furthermore,  we  can  show, 
using  various  properties  of  the  function  f(S,v),  that  after  n(x)  is  added 
to  each  shape-set  of  node  x,  only  one  fv-set  in  each  category  will  sur¬ 
vive  the  comparisons.  This  serves  to  place  a  constant  upper-bound  on  the 
number  of  fv-sets  generated  at  each  node,  limiting  the  effort  required 
by  the  search  to  a  constant  times  the  number  of  operators  in  the  given 
expression. 
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IV. 1  The  Fv-Sct  Comparison  Theorem 

The  letters  A,  B,  and  C  here  denote  fv-sets,  with  S(A),  S(B),  and 
S(C)  their  fringe  sets. 

Let  I(A,v)  =  the  number  of  integers  j  in  A  such  that  j>v. 

Properties  of  I(A,v): 

1.  w >v  implies  that  I(A,v)>I(A ,w) 

since  the  set  of  values  in  A  which  are  >  v  contains  the 
set  of  values  in  A  which  are  >  w. 

2.  If  S (A)  and  S(C)  are  dls  joint,  then  I(A  U  C,v)  =  I(A,v)  +  I(C,v) 

Let  f(A,v)  =  I(A,v)  +  v  -  1  if  I(A,v)>0  and  v>0 
=  0,  otherwise. 

Let  f(A)  =  max  f(A,v) 

Theorem  1;  f(A)  =  N(A) 

Let  A  *  B  mean:  Vw  3v  [w>0  -»  w>v>0  A  f  (Btv)>f  (A,w)  ] 

Lemma  1;  A  S  B  VC  [A  U  C  <  b  U  C] 

Lemma  2:  A  «  B  ->  N(A)  <  N(B) 

Lemma  3:  —.[A  S  B]  -»  3C  [N(A  U  C)  >  N(B  U  C)  ] 

Theorem  2:  A  s  B  £  VC  [N(A  U  C)  <  N(B  U  C) ] 

We  follow  this  brief  statement  of  the  results  of  this  section  with  their 
detailed  proofs. 

Suppose  A  =  (A[0], ...A[m],A[m+l]=0),  with  A[i]  >  A[i+1]  >  0,  for  m  >  i  > 
Properties  of  A: 

1.  If  v  =  A[i]  >  0,  then  3j  such  that  v  =  A[j]  >  A[j+1], 

Proof:  There  is  at  least  one  j  such  that  v  =  A[j],  namely  jssi. 
Suppose  Vj  such  that  v  =  A[j],  A[j]  <  A[j+1]. 

By  construction  of  A,  A[j]  =  A[J+1]. 

Therefore  Vj>i  v=A[J]. 

In  particular  v  =  A[m+1 ]  =  0,  contradicting 
the  assumption  that  vX). 


103 


2.  If  A[j]  >  v  >  A[1+1]  then  I(A,v)  =  j+1 . 

Proof:  I(A,v)  is  the  number  of  A[jJ's  such  that  A[j]  >  v. 

In  constructing  A,  we  get 

A[j]  >  exactly  j-H  members  of  A, 

A[ 0] , • . . »A[ J ] . 

All  of  these  are  >  v,  so  I(A,v)  >  J+1. 
v  <  A[J+1],  so  v  <  A[i]  for  all  i>j+1 . 

Hence,  I(A,v)  =  HI. 

Theorem  1:  f(A)  =  N(A) 

where : 

f (A)  =  max  f (A,v) 

N(A)  =  max  (A[i]+i,0) 
n*i*0 

A[1]>0  (where  A  is  as  before) 

Let  g(A, 1)  =  A[l]+1  If  A[1]>0  and  m>  i  >  0, 

=  0,  otherwise. 

Then  N(A)  =  m^x  g(A,l). 

Proof ; 

1.  If  A[l]  >  A[i+1]  and  ra  >  1  >  0,  then  f(A,A[i])  >  g(A,l). 
Proof:  X(A,A[i])  =  1+1  >  0 

If  A[i]  =  0,  f(A,A[l])  =  0  =  g(A, 1) 
else,  f(A,A[i])  =  I(A,A[i])  +  A[i]  -1 
=  1  +  1  +  A[i]  -  1 
=  g(A,i) 

2.  If  A[i)  =  A[l+1)  >  0  and  o  >  1  >  0  then  g(A,i)  <  g(A,i+l) 
Proof:  A[n>+1  ]  =  0,  so  n  >  1,  or  m  >  1+1. 

therefore,  g(A,l)  =  A[l]+i  =  A[  1+1  3+1  <  A[i+1]+i+1 
g(A,i)  <  A[  1+1  ]  +  1  +  1  =  g(A,i+1) 
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3.  If  g(A,i)  =  m^ix  g(A,j)  >  0,  then  A[i]  >  A[i+1]  and  m  >  i  >  0 
proof: 

If  A[i]  -£>  A[ i+1  ] ,  A[i]  =  A[  1+1  ] . 

Also,  g(A,l)  >  0,  so  A[i]  >  0  and  m  >  1  >  0. 

Then  g(A,i)  <  g(A,i+1),  by  2. 

But  g(A, i)  =  m^x  g(A,j)  >  g(A, i+1 ) .  Contradiction. 

4.  If  g(A,i)  =  rn^x  g(A,j),  then  g(A,i)  =  f(A,A[i]). 
proof: 

By  1  and  3,  if  g(A,i)  >  0.  Otherwise  g(A,i)  =  0  for 
all  i,  0  <  i  <  m,  and  f(A,A[i])  =  0,  since  A[i]  =  0 
for  all  i. 

5.  Vv  3i  f(A,v)  <  g(A,i). 
proof: 

If  A[i]  >  v  >  0  f°r  some  i  such  that  m  >  i  >  0, 
then  3j  such  that  A[j]  >  v  >  A[j+1]. 
therefore,  I(A,v)  =  j+1 ,  so 

f(A,v)  =  I(A,v)+v-l  =  J+v  <  j+A[J]  =  g(A, j) . 

If  v  <  0  then  f(A,v  =  0  <  g(A,0). 

If  v  >  A[0]  then  I(A,v)  =  0,  so  f(A,v)  =  0  <  g(A,0). 

We  have:  3i  such  that  g(A,i)  =  rn^x  g(A, j)  ,  and 
f (A)  >  f(A,A[i])  =  g(A,i)  =  max  g(A,j). 

Vv  3i  f  (A,v)  <  g(A, i)  . 

For  some  V,  f(A)  =  f(A,V)  <  g(A,i)  <  m^x  g(A,j) 

Therefore  f(A)  =  m^x  g(A,j)  =  N(A). 
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Lemma  1  A:  If  v  >  0,  then 

if  I(A,v)  >  0,  f U  C,v)  =  f(A,v)+I(C,v) 
and  if  I(A,v)  =  0,  f(A  U  C,v)  =  f(C,v)  >  f (A,v)+I(C,v) 

proof: 

Suppose  v  >  0. 

Then  f(X,v)  =  I(X,v)+v-l  unless  I(X,v)=0. 

We  know  that  I(A  U  C,v)  =  I(A, v)+I(C,v) . 

Then: 

1.  Suppose  I(A,v)  >  0.  Then,  since  I(C,v)  >  0, 

I (A  U  C)  >  0. 

Therefore  f(A  U  C,v)  -  I(A  U  C,v)  +  v  -  1 

=  I(A,v)  +  I(C,v)  +  v  -  1. 
Also,  f(A,v)  =  I(A,v)  +  v  -  1, 

so  f(A  U  C,v)  =  f(A,v)  +  I(C,v). 

2.  Suppose  I(A,v)  =  0. 

Then  I(A  U  C,v)  =  I(C,v). 

Case:  I(C,v)  =  I(A  U  C,v)  >  0. 

Then  f(C,v)  a  I(C,v)4v-1  =  I(A  U  C,v)-fv-l 
=  f (A  UC,v) 

Also,  f(A,v)=0. 

Therefore 

f(A,v)+I(C,v)  =  I(C,v)  <  I(C,v)4v-l 

<  f (A  U  C,v) . 

Case:  I(C,v)  =  I(A  U  C,v)  =  0. 

Then  f(C,v)  =  0  =  f(A  U  C,v). 

Also,  f (A,v)+I(C,v)  =  0  <  f(A  U  C,v). 
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Lemma  1 B :  If  v  >  0  then 

f (A  U  C,v)  >  f(A,v)  +  I(C,v) 

Proof;  I(A,v)  >  0  V  I(A,v)  =  0. 

Ii  I(A,v)  >  0,  f(A  U  C,v)  =  f(A,v)  +  L(C,v)  by  Lemma  1A 
f (A  U  C,v)  >  f(A,v)  +  I(C,v) 

If  I(A,v)  =  0,  f (A  U  C,v)  >  f(A,v)  +  I(C,v)  by  Lemma  1A 
/.  f (A  U  C,v)  >  f(A,v)  +  I(C,v) 

Lemma  1;  A<B-»VC[AUC<BUC] 

Recall  that  A  ■<  B  means  Vw  3v[v  >  0  w  >  v  >  0A  f(B,v)  >  f(A,w)] 
Proof ;  We  must  show,  assuming  A  <  B,  that  for  each  w  >  0  there 
Is  a  v',  w>v'  >  0,  such  that 

f (B  U  C,v*)  >  f (A  U  C,w). 

We  know  that  for  each  w  >  0  there  is  v  such  that 
w  >  v  >  0  and  f(B,v)  >  f(A,w). 

Also,  w  >  v  implies  I(C,v)  >  I(C,w), 

so  f (B,v)  +  I(C,v)  >  f (A  0  +  I(C,w). 

Case:  I(A,w)  >  0 

Then  f(A,w)  >  0,  so  f(B,v)  >  0,  giving  I(B,v)  >  0, 

/.  f (A  U  C,w)  =  f  (A,w)  +  I(C,w) 

and  f(B  U  C,v)  =  f(B,v)  +  I(C,v),  by  Lemma  1A, 

so  f (B  U  C,v)  >  f (A  U  C,w) , 

and  we  may  take  v'  **  v. 

Case;  I(A,w)  =  0. 

Then  f(A  U  C,w)  =  f(C,w)  by  Lerana  1A 

f(C,w)  <  f(C,w)  +  I(B,w)  <  f(B  U  C,w),  also  by  Lenina  1A. 
Therefore,  f(A  U  C,w)  <  f(B  U  C,w) 
and  we  may  take  v'  =  w. 

Lemma  2:  A<  B  -»N(A)  <  N(B) 

Proof:  We  show  A  <  B  -» f(A)  <  f(B).  (Then  use  Theorem  1.) 

A  <  B  =  Vw  3v  [w  >  0  '■»  w  >  v  >  0  A  f(B,v)  >  f(A,w)] 

/.  V(w  >  0)  3v[f(B,v)  >  f(A,w)] 
f(B)  =  max  f(B,v)  >  f(B,v) 
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.'.Vw>0  f(B)  >  f(A,w) 
also  f(A.O)  ■  0,  and  £(B)  >  0 
/.  Vw  f (B)  >  f(A,w) 

or  f(B)  >  f(A)  ■  max  f(A,w)  =  f(A,v*),  for  some  v* 

—  u 

Lemma  3:  -i[A  •<  B]  -*3c  [N(A  U  C)  >  N(B  U  C)] 

-{A  <  B]  means  3w  Vv  [w  >  0  A  [w  >  v  >  0  -♦  f(B,v)  <  f(A,w)]] 

Proof:  If  N(A)  >  N(B),  choose  C  empty. 

Otherwise,  for  v  as  in  (1)  let 

m  =.  N(B)  -f (Atw) 
m  >  0,  for 

N(B)  =  f (B)  >  f (A)  >  f(A,w) 

Take  C  to  consist  of  m  +  1  occurrences  of  the  integer  w. 

W  <  w  f(A  U  C,v)  =  f(A,v)  +  m  +  1 

and  f  (B  U  C,  -)  =  f(B,v)  +  m  +  1 
Vvl  >  w  f(A  U  C.vl)  .  f(A,v1) 

and  f (B  U  C,v1)  =  f(B,vl) 
f (A  U  C,w)  =  f (A,w)  +  m  +  1  >  N(B) 

N(B)  >  f(B,v1)  =  f (B  U  C.vl)  for  all  vl  >  w 
for  all  v  <  w  f(A  U  C.w)  =»  m  +  1  +  f(A,w)  >  m  +  1  +  f(B,v)  »  f(B  U  C.v) 
f (A  U  C,w)  >  N(B  U  C) 

N(A  U  C)  >  N(B  U  C) 


Theorem  2:  A  <  B  =  Vc[N(A  U  C)  <  N(B  U  C) 

Proof:  A  <  B  ->  Vc  [AUC<BUC] 

-♦Wc  [N(A  U  C)  <  N(B  U  C)]  by  Lenma  2 
— 1[ A  <  B]  -4-,  Vc  [N(A  U  C)  <  N(B  J  C)]  by  Lenna  3 
Vc  [N(A  U  C)  <  N(B  U  C))  -» A  <  B 
A  <  B  b  Vc  [N(A  u  C)  <  N(B  U  C)] 


Example:  Use  of  the  fv-set  comparison  theorem: 

Given  A  .  (3221)  and  B  »  (31111), 
we  investigate  whether  A  >  B,  or  B  >  A. 

We  will  compute  f(A,v)  and  k(A,v)  ■  max  f(A,w),  as 

v>vi>0 

well  as  f(B,v)  and  k(B,v). 

We  then  need  only  ask  if,  for  all  v  >  0,  f(A,v)  <  k(B,v) 
to  determine  if  A  <  B. 


For  all  v,  f(A,v)  <  k(B,v)  A  <  B. 

We  try  A'  =  (21)  ,  B'  .  (3) 

v  f (A ' ,v)  k(B ' ,v)  f(B',v)  k(A',v) 

4  0  0  0  2 

3  0  3  3  2 

2  2  2  2  2 

12  1  1  2 


Here,  neither  A'  >  B',  since  f(A',3)  «=  0  <  k(B',3)  -  3 

nor  B'  >  A',  since  f(B',l)  *>  1  ^  k(A',1)  «=  2 

Potentially,  if  enough  'ones'  are  united  with  both  A'  and  B', 

A'  U  C  will  eventually  achieve  a  larger  N(A'  U  C)  than  will  B': 

Let  C  =  (11).  Then 

A'  UC=  (2111)  ,  N(A '  UC)  =  max(2+0, 1+1 , 1+2, 1+3)  -  4 

B'  U  C  -  (311)  ,  N(B'  (J  C)  -  max(3+0, 1+1 , 1+2)  -  3 

Of  course,  N(A')  =  2,  N(B')  =  3,  so  their  "actual"  situation 
can  be  reversed. 
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IV. 2  Application  of  the  Comparison  Theorem  to  the  Leaves-in  Algorithm 

Each  fv-set  produc  1  during  the  leaves-in  algorithm  ultimately 
becomes  part  o£  an  fv-set.  which  is  compared  against  all  other  fv-sets 
at  some  node.  Fv-sets  A  which  satisfy  n(A)  >  n(B)  for  some  fv-sets 
A  and  B  at  node  x  are  not  chosen  as  the  best  method  for  computing  x. 

If  we  had  conditions  which  guaranteed  that,  for  all  C,  n(A  U  C)  >  n(B  U  C) , 
and  if  each  C  joinable  to  A  by  some  series  of  parallel  connections 
were  joinable  to  B  as  well,  then  A  would  not  need  further  investi¬ 
gation.  In  particular,  we  would  not  have  to  generate  fv-sets  A  U  C 
for  each  possible  C,  since  we  would  know  that  an  at  least  equally  good 
alg-tree  exists:  B  U  C.  Thus,  A  need  not  be  retained  in  A's  shape- 
set  at  x  so  that  generating  all  possible  joins  of  A  to  C's  is  avoided. 

The  previous  section  has  shown  that  if  (and  only  if)  A  r  B, 
then  for  all  C,  N(A  U  C)  >  N(B  U  C).  We  still  do  not  know  the  rela¬ 
tionship  between  n(A  U  C)  and  n(B  U  C),  however.  Furthermore,  we  must 
develop  a  criterion  for  the  interchangeability  of  two  fv-sets  so  that 
any  C  Joinable  to  one  can  be  joined  to  the  other.  The  present  sec- 
ticn  develops  sufficient  conditions  for  the  application  of  the  compar¬ 
ison  theorem  in  deleting  fv-sets  from  the  shape-sets  of  the  leaves-in 
algorithm. 

First,  we  claim,  by  virtue  of  Property  1  of  an  alg-tree,  that 
two  fv-sets  A  and  B  which  are  both  members  of  the  same  shape-set  are 
Interchangeable.  For  that  alg-tree  property  shows  that,  if  A  can  be 
a  sub-alg-tree  of  some  alg-tree,  and  is  a  member  of  G(S,x)  then  so 
can  any  member  B  of  G(S,x).  But  G(S,x)  is  just  the  shape-set  containing 
A  and  B  at  x.  Hence,  A  and  B  are  interchangeable,  if  both  belong  to 
G(S,x)  for  some  node  x  and  shape  S. 

The  extension  of  the  comparison  theorem  to  n(A  U  C)  and  n(B  U  C) 
requires  more  thought.  Basically,  n(A)  may  equal  N(A)  or  N(A  U  1), 
depending  on  whether  A's  result  c>Tn  occupy  an  input  array  of  A's 
AATA^,  or  not.  This  is  determine)  by  A's  fringe-shape  set.  The 
problem  is  that  A  may  satisfy  N(A,)  >  N(B),  while  n(A)  <  n(B),  for 

^Aesociated  alg-tree  algorithms.  See  Section  3,  Chapter  III. 
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example  if  n(A)  =  N(A)f  while  n(B)  =  N(B  U  !)•  The  method  used  to 
join  fringe-shape  sets  allows  determination  of  conditions 
under  which  n(A  U  C)  >  n(B  U  C)  for  all  C. 

Let  C  be  an  fv-set  rooted  at  some  node  vdilch  is  not  x  or  a 
descendant  of  x.  We  say  that  C  is  an  outside-x  fv-set.  If  A  is  a 
member  of  some  shape-set  of  x  then  when  C  is  joined  to  A,  the  fringe- 
shape  sets  of  A  and  of  C  set-unite.  This  fact  allows  us  to  derive 
conditions  on  the  fringe-shape  sets  of  A  and  of  B,  two  fv-sets  of  the 
same  shape-set,  which  guarantee  that,  if  A  >  B,  then  n(A  U  C)  >  n(B  U  C) 
for  all  C. 

Let  F(c),  where  C  is  an  fv-set  be  C's  associated  fringe-shape  set. 

Theorem:  If  A  >  B,  and  F(B)  F(A), 

then  n(A  U  C)  >  n(B  (J  C)  for  all  C. 

Proof:  F(A  U  C)  =  F(A)  U  F(C),  by  the  steps  of  the  leaves-in  algorithm. 

Therefore 

F(B  UC)  o  F(B)  U  F(C)  r>  F(A)  U  F(C)  =  F(A  U  C) 

Hence,  if  A  U  C  occurs  in  shape-set  S,  then  if  S  e  F(A  U  C), 

S  e  F(B  U  C).  Therefore,  n(A  U  C)  «=>  N(A  U  C)  implies  that 
n(B  U  C)  o  N(B  U  C). 

Case:  n(A  U  C)  =  N(A  U  C).  Then 

n(B  U  C)  o  N(B  UC)  <  :.(A  U  C)  =  n(A  U  C) 
so  n(B  UC)  <  n(A  U  C). 

Case:  n(A  U  C)  =  N(A  U  C  U  '  )  Then 

n(B  U  C)  <  N(B  U  C  U  1  )  <  N(A  U  C  U  1  ) 

so  n(B  U  C)  <  n(A  U  C) 

In  summary  then,  if  A  and  B  belong  to  the  same  shape-set,  and  A  >  B, 
and  F(B)  Z>  F(A),  then  A  may  be  deleted  from  the  shape-set  without 
compromising  the  optimality  of  the  compilation  of  the  given  expression 
into  AATA's. 
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IV. 3  The  Leaves-In  Algorithm,  with  Comparison  Theorem 

The  following  algorithm  Is  to  be  applied  to  the  nodes  of  the 
expression's  parse-tree,  E,  In  the  following  order.  It  Is  to  be  applied 
to  a  node  x  only  after  being  applied  to  each  of  the  descendant  nodes  of 
x,  taken  In  any  order. 

(1)  If  x  Is  a  leaf  of  E,  then  set  G(S,x)  =  [(0)]  for 
each  S.  (0)  Is  an  fv-set  constant,  containing  no 
shapes  In  Its  fringe-shape-set.  Exit. 

(2)  Initialize  each  shape-set  G(S,x)  to  the  empty  set. 

(3)  Find  an  EEP1  e  which  matches  E  at  x.  Suppose  S  Is  the 
root-shape  of  e.  Find,  for  each  leaf  1  of  e,  the  node 

of  E  corresponding  to  1  In  the  match  of  e  to  E  at  x. 

(A)  Select  one  member,  for  each  i,  of  G(leaf-shape(l) ,L^). 

Join  the  selected  combinations  of  fv-sets,  using  fv-set- 
Joln  to  combine  the  fv-sets,  and  set  union  to  combine 
their  fringe-shape  sets.  Add  the  resulting  augmented 
fv-set  to  shape-set  S. 

(5)  Repeat  (A)  for  each  distinct  combination  of  fv-sets 
selectable  by  (A). 

(6)  Repeat  (3)-(5)  for  each  EEPT. 

(7)  Calculate  n(x)  =  min  n(A^),  where  A^  ranges  over  all 
fv-sets  In  any  shape-set  G(S,x).  Ado  the  fv-set 
(n(x))S  to  each  shape-set  G(S,x).  Here,  (n(x))S  Is  an 
fv-set  containing  the  integer  n(x)  only,  and  whose 
fringe-shape  set  contains  S.  Replace  G(ft,x)  with  (n(x)). 

(8)  Compare  each  pair  of  fv-sets  A  and  B  in  each  shape-set 
G(S,x).  If 

A  >  B,  and  F(B)  z>  F(A) 
then  delete  A  from  G(S,x). 

(9)  Exit. 
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The  root  of  E  will  be  the  last  node  visited  by  this  procedure. 

Along  the  way  records  can  be  kept,  describing  which  fv-sets  gave  rise 
to  each  retained  fv-set.  The  identity  of  the  best  alg-tree  (fv-set) 
available  for  computing  node  x  should  be  associated  with  some  copy  of 
(n(x)),  say  that  which  replaced  G(Q,x). 

Apply  the  following  algorithm  to  isolate  each  alg-tree  vtoose 
AATA  is  part  of  the  optimum  compilation. 

The  following  algorithm  is  applied  first  at  the  root  of  E. 

AATA(x):  (1)  Locate  G<jTJ,x).  Collect,  into  set  N,  all 

the  nodes  included  in  the  alg-tree  which  G(0,x)'s 
single  member  represents.  This  collection  is 
accomplished  by  following  the  records  of  fv-set 
generation  until  the  fringe-set  nodes  are 
reached.  Let  the  fringe-set  nodes  be  F^. 

(2)  For  each  i,  compute  MTA(F^). 

(3)  Print  N,  perhaps  with  additional  information, 
indicating  the  EEPT  rooted  at  each  node  in  N, 
and  other  information  which  is  recorded  in  the 
fv-set  tags.  Exit. 

IV. 4  Leave s-In  Algorithm  Effort  Requirement 

In  the  following  section,  we  demonstrate  that,  by  restricting  the 
given  set  of  EEPT's  appropriately,  the  effort  required  in  applying  the 
leaves-in  algorithm  to  any  given  expression,  E,  is  bounded  by  a  linear 
function  of  the  number  of  operators  in  E.  We  demonstrate  this  by 
showing  that  the  comparison-and-deletion  step  of  the  leaves-in  algorithm 
leaves  no  more  than  2  fv-sets  in  each  shape -set.  Since  no  more  than 
K  shape-sets  will  appear  at  each  operation  node  of  E's  parse-tree,  no 
more  than  2*K  fv-sets  occur  at  any  node.  Therefore,  at  each  node,  no 
more  than  (2*K)m  fv-sets  will  be  generated,  where  m  >  number  of  leaves 
of  any  EEPT.  This  number  is  reduced  to  ?*K  by  less  than  (2*K)  ^^com- 
pariscn-and-deletion  steps.  Therefore,  the  operators  of  E,  W  in  number. 
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generate  approximately  W  *  [ (2*K)m  +  ( 2*K) ^ j  steps.  Since  K  and  m 
are  constant  with  W,  this  shows  that  the  effort  is  bounded  by  a  linear 
function  of  W. 

The  critical  step  in  our  derivation  of  the  linear  effort-bound 
lies  in  bounding  the  number  of  fv-sets  in  any  shape- set  by  2.  It 
is  at  this  stage  that  we  must  impose  a  restriction  on  the  given 
EBPT's. 

Suppose  we  follow  the  steps  of  the  leaves-in  algorithm  to  a  point 
Just  after  all  fv-sets  have  been  computed  for  a  given  node.  The  next 
step  involves  computing 

n(x)  «  min  n(a^) 

for  all  fv-sets  a^  in  any  shape-set.  The  special  fv-set  (n(x))  is  then 
added  to  each  shape-set,  representing  the  "null"  alg-tree,  a  result 
stored  in  an  intermediate  array.  We  can  show,  under  some  circumstances, 
that  (n(x))  and  the  comparison  theorem  reduce  the  number  of  fv-sets 
in  each  shape-set  to  2,  at  most.  If  we  consider  only  a  finite  number 
K  of  shapes,  and  hence  shape-sets,  this  limits  the  number  of  fv-sets  at 
each  node  to  a  constant  2*K.  Ultimately,  this  will  let  us  show  that 
if  the  expression  contains  W  operators,  only  (2*K)  *  W  fv-sets  are  re¬ 
tained,  at  most.  We  thus  bound  the  search  effort. 

We  require  (and  will  assume  throughout  this  section)  that  all 
EEPT's  satisfy: 

Let  root-shape(e)  =  S  and  leaf-shape (i,e)  =  T^, 

for  each  leaf  1  of  e. 

For  all  leaves  i  of  e. 

If  S  /  T^,  then  either  S  o  n  or  T^  =  Q. 

The  purpose  of  this  restriction  becomes  clear  in  Theorem  1. 
Basically,  the  restriction  guarantees  that  only  one  shape  in  each 
fringe-shape  set  can  ever  be  relevant,  regardless  of  the  set-unions 
an  fv-set  enters.  For  each  shape- set  G(S,x),  that  relevant  shape  in 
each  of  its  member's  fringe- shape  sets  is  S.  Furthermore,  it  ensures 
that  the  special  fv-set  (n(x)),  which  Is  added  to  each  shape-set. 
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can  be  jsed  to  delete  any  fv-set  in  that  shape-set.  Here  n(x)  a  min  n(A) 
for  all  fv-sets  A  in  shape-sets  of  node  x.  The  restriction  thus  effec¬ 
tively  relaxes  the  requirement  that  F(B)  o  F(A)  before  B  may  be  deleted 
by  A.  Theorem  1  partitions  each  shape-set  into  two  classes.  The 
remaining  theorems  show  how  (n(x)): 

(1)  replaces  one  of  these  classes,  and 

(2)  leaves  only  elements  B,C  in  the  other  such  that  B  >  C 
and  C  >  B. 

Thus,  we  show  that  only  one  element  remains  in  each  class. 

One  final  comment.  The  restriction  we  impose  is  light  enough  that 

3 

all  EEPT's  generated  by  the  n  algorithms  we  studied  satisfy  it.  Fur- 
thermore,  most  of  the  n  /2  algorithms  also  produce  acceptable  EEPT's. 

The  theorems  we  prove  here  are  thus  not  vacuous. 

Theorem  1 :  If  A  is  an  fv-set  in  shape-set  G(S,x),  then  F(A)  as  computed 
by  the  leaves-in  algorithm  satisfies: 

If  S  i  fl,  and  T  /  S  then  T  i  F(A). 

Proof:  By  induction  on  level(x).  Level(x)  is  an  integer  defined  for 
each  node  x  in  the  parse-tree  E  as: 

level(x)  =  1  +  max  level(x^),  for  all  sons  x^  of  x. 
and  level(x)  ■  0  if  x  has  no  sons  (is  a  leaf  of  E). 

When  level(x)  >  0,  x  is  a  leaf  of  E,  and  the  fringe-shape- 
sets  of  all  fv-sets  A  of  all  leaves  are  empty.  Therefore, 

T  i  F(A) . 

When  level(x)  «  I  >  0,  we  assume  the  the  rem  for  all  nodes 
y  such  that  level (y)  <  level (x).  In  particular,  we  assume 
it  for  all  descendants  x^  of  x. 

Each  fringe-shape-set  F(A)  in  shape-set  G(S,x),  S  /  0,  Is 
generated  by  F(A)=  y  F(A^),  where  FCA^)  is  a  fringe -shape -set 
in  a  shape-set  G(S^,  x^)  of  some  descendant  node  x^  of  x. 
Furthermore,  x^  is  the  jth  leaf  of  the  EEPT  e  rooted  at  x 
which  generates  F(A).  Also, 
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root- shape (e)  o  S 

leaf- shape (j,c)  ■  S,  or  (by  the  EEPT  restriction,) 

-  a 

If  leaf-shape( j,e)  ™  S,  then  FCA^)  is  chosen  from  a 
shape-set  G(S,X^)  such  that  S  /  0. 

But  is  a  descendant  of  x,  and  by  assumption,  a 
fringe-shape-set  in  a  shape-set  G(S,x^)  such  that 

S  /  Ci  satisfies 

if  T  /  S  then  T  l  F(A1>. 

Also,  if  leaf-shape(j,e)  ■  0,  F(A^)  comes  from  shape- 
set  G((l,x1).  But  this  shape-set's  members  all  have 
empty  fringe-shape-sets,  so 

T  i  F(A t) 

Therefore  if  T  /  S,  then  T  l  F(A^),  so 
T  4  y  F(At)  -  F(A) 

A  following  step  of  the  leaves-in  algorithm  replaces 
shape-set  0  at  node  x  with  the  fv-set  (n(x))  ■  X.  When 
added  to  shape-set  (1,  F(X)  =  J d,  Also,  X  is  adjoined  to 
shape-set  G(S,x),  with  F(X)  =  S.  After  this  step,  if 
T  /  S  /  Cl,  and  if  B  is  an  fv-set  in  shape-set  S,  then 
T  /  F(B), 

Thus,  the  theorem  is  true  for  nodes  x  such  that 
level(x)  a  I,  and  hence  true  for  all  nodes  x  in  E. 

Corollary!  If  A  is  an  fv-set  in  shape-set  G(S,x),  and  S  /  Q, 
then  F(A)  is  either  empty,  or  contains  only  S. 

Proof!  If  F(A)  contained  T  /  S,  it  would  violate  Theorem  1  of 
this  section. 

Thus,  we  can  divide  the  fv-sets  A  in  shape-set  G(S,x)  into  two  dis¬ 
joint  classes,  those  such  that 
F(A)  =  {S} 
and  those  such  that 
F(A)  n  fi 

The  first  class  will  be  called  "1-class",  the  second  "0-class". 
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During  the  leaves-ln  algorithm,  for  each  node  x  we  compute  n(x), 
and  adjoin  (n(x))  to  each  shape-set  of  x.  X  =.  (n(x))  becomes  a  part  of 
1-class  of  each  non-fl  shape-set.  We  will  show  that  all  members  B  of 
a  given  1 -class  satisfy  B  >  X.  Since  B  and  X  are  both  members  of  the 
same  shape-set,  G(S,x),  and  F(B)  «*  F(X)  ■  S,  B  is  deletable  by  X.  There¬ 
fore,  after  the  deletion,  only  one  member,  X,  is  left  in  each  1-class. 
Similarly,  we  can  show  that  only  one  member  is  left  in  each  O-class. 

This  demonstrates  that  in  each  shape-set,  only  2  members  remain  after 
the  comparison-and-deletion  step  of  the  leaves-in  algorithm. 

Theorem  2:  If  n  is  an  integer  >  0,  and  B  an  fv-set,  then  if 
n  <  N(B),  (n)  <  B. 

Proof:  N(B)  >  n  implies  that 

Vv  f  (B,v)  >  n. 

Also,  (n)  has  the  property  that 
f((n),w)  <  n  for  all  values  w, 
since  if  w  >  n,  I((n),w)  n  0 

so  f((n),w)  b  0  <  n 
and  if  n  >  w  >  0,  I((n),w)  «  1 

so  f((n),w)  a  I((n),w)  -  1  +  w  =  w  <  n. 

Of  course,  if  w  «  0,  f((n),w)  =  0  <  n. 

Therefore,  Vv,w  f(B,v)  >  n>  f((n),w). 

In  particular,  for  each  w  >  0  3v  such  that 
w  >  v  >  0  and 
f (B,v)  >  f((n),w) 


therefore,  B  >  (n) 
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Theorem  3:  (n(x))  <  A  for  all  fv-sets  A  In  any  1-class. 

Proof:  By  definition  n(x)  *»  mjLn  (n(A^))  for  all  fv-sets  A^,  at  node  x. 
Therefore  n(x)  <  n(A)  ■  N(A)  for  all  A  in  any  1 -class. 

Therefore  (n(x))  <  A,  by  theorem  2. 

Theorem  4:  If  B  is  an  fv-set  in  0-class,  and  B  >  (n(x)),  then  B 
will  be  deleted. 

Proof:  F(B)  is  empty,  by  definition  of  0-class.  The  1-class  of 
the  shape-set  containing  B  contains  X  a  (n(x)),  and  hence 
both  (n(x))  and  B  belong  to  the  same  shape-set.  Also, 

F(X)  =>  F(B)  »  Therefore,  B  is  deletable  by  X. 

Hence,  if  B  >  (n(x)),  B  will  be  deleted  by  the  leaves- in 
algorithm. 

Suppose  B  is  an  fv-set  of  0-class  which  remains  after  the  deletion  step. 
Lemma  1:  N(B)  <  n(x). 

Proof:  If  N(B)  >  n(x),  then  B  >  (n(x))  by  Theorem  2,  and 
would  be  deleted,  by  Theorem  4. 

Lemma  2:  N(B  U  1)  >  N(B). 

Proof:  n(B)  ■  N(B  U  1),  since  B  e  0-class 

n(B  U  1)  -  n(B)  >  n(x)  >  N(B). 

Theorem  5:  If  N(C  U  1)  >  N(C)  then  f(C,l)  =  N(C), 
and  N(C  U  1)  *=»  N(C)  +  1. 

Proof:  N(C)  max  f(C,v) 

N(C  U  1)  **  max  f(C  U  1  ,v) 

f(C  U  1,v)  =  f (C,v)  +  1(1, v) 

o  f (C,v)  if  v  >  1, 

=  f(C,l)  +  1  else. 

V  v  >  1  f  (C  U  l,v)  «  f(C,v),  while 

f(C  J  1,1)  «  f(C,1)  +  1 
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Proof: 


Theorem 


If  f(C,1)  <  N(C),  then: 

1  +  f<C,1)  <  N (C) . 

Vv  f(C  U  l,v)  <  N(C), 

or  N(C  (J  1)  <  N(C)  contradicting  the  theorem's  hypothesis. 
Therefore,  f(C,1)  >  N(C). 

But  f(C,1)  <  max  f(C,v)  ®  N(C) 
so  f(C,l)  =  N(C) . 

Also,  for  all  v,  f(C,v)  >  f(C  U  1,v)  -  1 
In  particular  f(C,v*)  >  f(C  U  1 ,v*)  -  1  a  N(C  U  1)  -  1, 
so  N(C)  +  1  >  N(C  U  1) 
also,  N (C  U  1)  >  N(C),  so  N(C  U  1)  >  N(C)  +  1 
giving  N(C  U  1)  “  N(C)  +  1  , 

_6:  If  N(C)  a  f(C,1)  =  f (B,  1 )  =  N(B) , 
then  C  >  B  and  B  >  C. 
f(C,1)  =  N(B)  a  max  f(B,v) 

Therefore,  f(C,1)  >  f(B,v),  for  all  v 
Thus,  for  all  v  >  0,  w  «  1  satisfies 
v  >  w  >  0  and 
f (C,w)  =  f(C,1)  >  f (B,v). 

Therefore  C  >  B. 

Similarly,  because  f(B,l)  a  N(C),  B  >  C. 

_7:  If  C  remains  in  0-class  after  the  deletion  step, 

N(C)  a  n(x)  -  1,  and 
N(C)  a  f(C,1). 


119 


Proof:  By  Lenina  2  N(C  U  1)  >  N(C),  so 

by  Theorem  5  N(C  U  1)  «s  N(C)  +  1. 

Also,  by  Lemma  1,  N(C)  <  n(x)f 

so  n(x)  <  N(C  U  1)  -  N(C)  +  1. 

We  have  N(C)  <  n(x)  <  N(C)  +  1  for  integers 
N(C),  n<x),  so 
N(C)  =  n(x)  -  1, 

Part  2  follows  directly  from  Lemmas  1,2  and  Theorem  5. 

Theorem  8:  If  B  and  C  remain  in  0-class  after  the  deletion  step, 

B  >  C  and  C  >  B.  Therefore  one  may  be  deleted. 

Proof:  By  Theorem  7,  N(B)  =  n(x)  -  1  -  N(C). 

Also  by  Theorem  7,  N(B)  a  f(B,1)  and  N(C)  =  f(C,1). 

Therefore  by  Theorem  6,  B  >  C  and  C  >  B. 

Since  both  B  and  C  are  in  0-class,  they  are  in  the  same 
shape-set,  and  F(B)  a  F(C)  a  /j. 

Therefore,  one  may  be  deleted. 

An  lmnediate  consequence  of  Theorem  8  is  that  only  one  fv-set 
remains  in  0-class  after  the  deletion  step.  Also,  Theorem  3  has  shown 
that  only  one  fv-set  remains  in  1-class.  Since  0-class  and  1-class  of 
a  shape-set  together  cover  that  shape-set,  only  2  fv-sets  remain  in 
each  shape-set  after  the  deletion  step. 


CHAPTER  V 


V.1  Stannary  of  Results 

We  have  described  a  transformation,  loop-fusion,  on  programs.  If 
X  is  a  sequence  of  two  loops  satisfying  certain  conditions  on  the  sets 
of  variables  accessed,  loop- f us ion (X)  is  a  single  loop,  computationally 
equivalent  to  X,  which  executes  no  more  operations  than  X.  Such  equiva¬ 
lent,  time-conservative  transformations,  applied  to  any  program,  yield 
new  programs,  with  different  characteristic  space  requirements,  which 
are  valid  alternative  programs  for  the  programming  task  performed  by  the 
given  program.  We  can  search  the  space  of  such  programs  for  one  which 
requires  least  space. 

We.  present  two  sets  of  alternative  programs  for  computing  the  matrix 
assignment  statements  C  «-  A  *  B  and  C  «-  A  +  B.  Each  set  of  programs 
forms  the  basis  for  an  "equal- time"  collection  of  algorithms  for  evaluating 
matrix  arithmetic  expressions  in  +  (matrix  addition)  and  *  (matrix  multi¬ 
plication)  on  square  N-by-N  matrices.  Each  compilation  of  a  given  matrix 
arithmetic  expression  into  sequences  of  algorithms  in  a  given  equal-time 
collection  of  algorithms  requires  the  same  amount  of  execution  time.  These 
equal-time  collections  are  derived  by  loop  fusion  from  the  st.t  of  algorithms 
forming  the  basis  for  the  collection. 

An  algorithm  for  choosing  that  compilation  of  any  given  matrix 
arithmetic  expression,  E,  which  uses  the  fewest  2-arrays7  is  presented. 

This  algorithm,  called  the  leaves-in  algorithm,  uses  the  properties  of 
loop-fusion  to  "tailor"  algorithms,  selected  from  an  equal-time  collection 
C,  to  fit  each  part  of  E.  It  searches  over  all  possible  compilations  of 
E  into  algorithms  of  C,  potentially  generating  &  number  of  algorithms 
which  is  proportional  to  \Jl  )**W,  where  there  are  W  operators  In  E. 

A  general  technique  is  then  presented  for  reducing  the  number  of 
cases  which  an  exhaustiv*'  search  for  an  optimum  alternative  must  examine. 


A  2-array  is  a  set  of  variables  capib.j  of  holding  one  N-by-N 
matrix. 
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Such  March**  can  generate  alternative*  by  assigning  values  one  by  on* 
to  the  state-variables  vhlch  describe  an  alternative.  Usually,  the  value 
of  th*  criterion  function  of  a  partially-specified  alternative  cannot  be 
computed  without  completing  the  spec  If  lest  lor.  In  all  possible  ways,  the 
given  technique  allows  sons  partially-specified  alternatives  to  be  rejected 
without  generating  all  coapletlons,  by  guaranteeing  that  for  every  cos^ 
pletloo  C  U  A  of  one  such  alternative.  A,  there  is  a  collation  GUI 
of  another,  1,  which  is  better  than  C  U  A.  Thus,  generating  all  the  com¬ 
pletions  C  U  A  of  A  Is  unnecessary  In  the  search  for  an  optimum*  valued 
alternative. 

lhis  technique  Is  applied  to  the  search  performed  by  the  leaves- In 
algorithm.  Here,  an  "alternative**  is  an  algorithm  for  evaluating  th* 
entire  expression,  I.  The  value  of  the  criterion  function,  M(8),  of  an 
alternative  t  Is  the  number  of  2-arrays  needed  by  S.  A  partially-specified 
alternative  A  is  an  algorithm  for  evaluating  scan  subexpression  1(A)  of 
I.  A  predicate  P(A,B)  Is  defined  on  partially- spec  If  led  alternatives  A 
and  B,  equivalent  to 

UC  M(A  U  C)  £  N(B  U  C), 

Where  A  U  C  is  e  cospletion  of  A,  and  B  U  C  Is  a  completion  of  B,  derived 
by  using  B  instead  of  A  in  subexpression  1(A)*  P(A,B)  may  be  evaluated 
without  generating  all  alternatives  C.  When  P(A,B)  is  true,  A  may  be 
rejected  without  Investigating  all  its  possible  completions,  for  each  Is 
known  to  be  no  better  (l.e. ,  no  smaller)  than  some  completion  of  B. 

Using  predicate  P(A,B)  to  reject  alternatives  generated  during  the 
leaves- in  algorithm,  a  modified  leaves- In  algorithm  Is  produced.  This 
modified  algorithm  Investigates  only  k  *  V  alternatives,  where  V  is  th* 
maker  of  operators  in  the  expression  B,  and  where  k  does  not  depend  on  V. 

We  have  studied  a  set  of  program  alternatives  for  implementing  one 
class  of  matrix  arithmetic  expressions,  searching  for  a  program  which 
uses  fewest  2-art  ays  while  never  computing  the  value  of  any  element  of  a 
aubexpre  salon  more  than  once.  W*  mu  at  admit  that  not  all  progrmas  sat¬ 
isfying  these  criteria  have  been  Investigated.  In  particular,  ve  have 
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studied  only  those  progress  derivable  by  loop  fusion.  Other  sethods 
of  constructing  algorithms  for  evaluating  Matrix  arithmetic  expressions 
say  exist,  possibly  yielding  algorithms  which  use  fever  2-arrays  than 
those  the  leaves- In  algorithm  can  discover.  Nevertheless,  searches  over 
"prograa  technologies"  like  that  thlch  the  leaves- in  algorithm  investigates 
are  Interesting  In  their  own  right,  and  say  veil  yield  ucar-optlmua 
results. 

Two  types  of  generalisation  of  our  work  cos*  to  mind.  Certainly, 
different  optimisation  crlterle  could  be  used,  In  pertlcular,  el loving  pro¬ 
graa  coablnatlons  which  ere  not  alnlanas-connection-tlac,  erd  optimising  soas 
combination  of  prograa  execution  tlsw  and  aeaory  space.  Also,  aany  gener¬ 
alisations  of  "matrix  arithmetic  expressions"  as  ve  have  defined  them 
appear  Interesting.  We  feel  that  It  may  be  worthwhile  to  Indicate  some 
of  the  possible  expression  generalisations  which  our  current  technique 
cannot  handle. 

First,  we  could  consider  expressions  containing  acre  than  one  occur¬ 
rence  of  a  particular  subexpression.  The  data  flow  diagram  of  such  an 
expression  contains  nodes  having  more  than  one  Incident  line.  The  pres- 
cence  of  such  nodes  aakes  n(x),  for  soea  nodes  x  in  the  diagram,  depen¬ 
dent  on  more  than  n(x^)  for  all  descendants  x^  of  x. 

For  exngile,  consider: 


the  value  of  n(x)  depends  on  whether  y  Is  to  be  computed  before,  or 
after  x.  If  before,  then  die  cosmon  node,  s,  must  be  cosqiuted  before 
w.  If  after,  the  orders  [v;z]  *&d  [s;v]  are  both  possible. 

Secondly,  vc  could  allow  use  of  the  associative  lavs  of  matrix 
addition  and  multiplication  by  relaxing  the  requirement  that  an  expres¬ 
sion  be  fully  parenthesised.  Here,  each  possible  association  could  be 
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generated,  and  the  leave* -in  algorithm  could  be  applied  to  each  resulting 
binary  parse- tree,  A  aore  elegant,  lea*  tlne-consimlng  search  should 
be  devised,  however. 

thirdly,  we  could  consider  allowing  the  variables  of  the  expression 
to  be  rectangular  natrlces  with  the  usual  cooformablllty  requirements 
Imposed.  This  generalisation  we  believe  lies  In  the  scope  of  the  leaves- 
in  algorithm,  the  major  requirement  Is  a  generalisation  of  the  definition 
of  fv*set  to  allow  arrays  of  various  slses  to  hold  the  different  Inputs 
to  an  associated  alg-tree  algorithm. 

Other  programing  language  constructs  far  different  from  matrix 
arithmetic  expressions  could  conceivably  be  technologically  optimised. 

These  constructs  must  be  such  that  several  alternative  Implementations 
are  available  for  each  instance  of  the  construct. 

One  of  the  more  interesting  of  such  examples,  in  which  the  available 
alternatives  are  particularly  clear,  concerns  constructs  which  specify 
parallelism.  These  cons  true  to  indicate  to  a  compiler  that  certain  opera¬ 
tions  can  "proceed  in  parallel",  l.e.,  that  any  ordering  of  these  opera¬ 
tions  which  preserves  the  relative  order  of  the  operations  In  each  "parallel 
sequence"  ylelde  equivalent  reeults.  These  constructs  are  Intended  for  an 
environment  in  Which  more  than  one  processor  is  available.  However,  where 
only  one  processor  exists,  they  permit  the  compiler  to  choose  a  space- 
minimal  ordering  of  the  given  operations. 

An  important  class  of  problems,  with  consequences  for  program  optim¬ 
isation,  concerns  the  development  of  transformations  which  generate  pro¬ 
gram  computationally  equivalent  to  a  given  program.  Such  transformations 
may  involve  change  of  data  representation,  change  of  sequence  of  certain 
operations,  or  aore  drastic  changes,  making  use  of  mathematical  properties 
of  the  programing  task  description  of  the  task  the  program  Implements. 

He  have  presented  one  such  transformation  (loop  fusion).  We  have  made  use 
of  another,  re-orderlng  of  operation  sequences.  In  generating  the  basic 
n^  algorithms  for  matrix  addition  and  multiplication,  other  transformations 
exist.  Notably,  we  can  consider  developing  techniques  for  compiling  func¬ 
tions,  described  as  collections  of  recursive  subroutines.  Into  efficient 
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iterative  programs. 

At  another  level,  there  say  veil  exist  program  co—unicatloa  alter* 
natives  tdiose  value  can  best  be  investigated  by  an  exhaustive  search. 

For  example,  variables  In  a  program  smj  exist  tdilch  the  prograssmr  has 
allocated  separately,  for  conceptual  reasons,  but  Whose  contents  ere 
never  relevant  simultaneously.  Compilers  could  locate  and  combine  these. 

One  could  also  conceive  of  automatic  choices  being  made  of  alter¬ 
native  numerical  procedures  for  various  phases  of  certain  programs. 

A  search  procedure.  In  conjunction  with  an  autosmtlc  error  analysis,  nsy 
be  useful,  to  determine  the  actual  sensitivity  of  the  results  to  small 
perturbations  In  the  Input  values.  Such  an  "experimental"  approach,  com¬ 
bined  with  alternative  numerical  procedure  trials,  might  yield  smaller 
error  bounds  than  can  conventional  human- lsg> lamented  numerical  analysis. 

The  general  problem  of  program  optimisation  Is  difficult  for  several 
reasons.  First,  the  mmbei  of  possible  prograamlng  approaches  to  many 
Interesting  prograamlng  tasks  seems  to  be  extremely  large.  The  slse  of 
this  number  prohibits  an  exhaustive  generation  of  each  possible  program 
capable  of  performing  the  given  task.  Second,  the  possible  programing 
alternatives  fox  a  given  task  are  difficult  to  determine.  This  le  partly 
the  fault  of  the  languages  In  Which  these  tasks  are  prograamed.  These 
languages  often  require  that  the  prograanmr  specify  ax>re  details  of  the 
procedure  to  be  followed  than  are  essential  to  the  task  to  l>e  performed. 
Third,  a  given  prograamlng  task  must  be  optimally  programed  not  once, 
but  many  times.  Bach  time  that  task  appears  as  a  subtask  of  scam  larger 
prograamlng  task,  the  program  which  Implements  It  optimally  must  change 
to  best  fit  the  new  context.  Thus,  one  cannot  hope  to  produce  an  optimised 
program  for  all  task  contexts.  One  could  profitably  develop  algorithms 
for  rapidly  finding  such  optimum  prograau.  The  hope  of  producing  such 
programmer-aided  algorithms  motivated  this  study. 

The  work  reported  here  has  barely  brushed  the  surface  of  the  study 
of  efficient  programs.  He  have  gained  soem  Insight  Into  only  a  few  of 
the  devices  progrsmrs  use  so  freely  In  producing  their  programs.  The 
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poeeibility  of  reducing  progr— r  effort  by  providing  prograaing  al- 
torith— .  rather  than  rulee-of-thuhb,  Motivated  our  etudy.  We  feel 
that  May  additional  interacting  and  ueeful  programer-aiding  algorithm 
ram  in  to  be  diccovered. 


APPENDIX  X 


The  following  algorithm,  due  to  8.  W1  nog  rad,  can  alao  ba  uaod  to 

computa  tha  matrix  aaalgnmant  statement  C  *-  A  *  B.  Wa  praaant  a  dari- 

vatlon  of  tha  algorithm,  and  tha  counta  of  tha  number  of  acalar  addl- 

tlona  and  multiplication#  it  raqulraa. 
n 

Vji,  vt-  ’*"•  *1*  \i ""  v  *u' 

yields  C  •  A  *  Bf  whara  A,  I  and  C  ara  (aquara)  matrices. 


n  even: 

n  n/2 

<*2l+  >,21-1)<*21.1+  »2t>  *  *2t-lJr2i-l*  *21>,2l*  *2t*21-l+  Wll-l 
n  n/2 

Th.r.f.r.,  (<;(«  Tj, 2a> 

n/2  n/2 

•tf,  "21*21-1  '  tf,  r2lr21-r 


Of  vha  terma  on  tha  right,  tha  laat  two  naad  ba  computed 
only  one#  for  each  row  of  A  or  column  of  B.  Thua,  tha 
oparatlon  counta  ara: 


*:  n2*  j  ♦  n(j  ♦  j) 

♦:  n2^  ♦  n<S  ♦  f) 


n3/2  +  n2 


+:  3n3/2  +  n2 


•  ns 


126b 


1L2±!* 


(n*l)/2 

V.  +  tf,  <"J1+>'11-1)(,21-1+  KI1> 

(n-l)/2  (n-l)/2 

l-l  *21*21*1  *  ifj  y2ly2i*l* 


Th«  oftntlon  cointa  in  chit  cat*  «ro: 

n2  <  b2#(b-1)/2  ♦ 

♦i  b2  ♦  n2*^Ejii-  ♦  n*[n-l] 

or  *:  b2/2  ♦  Jb2/2  -  n 

♦t  3b2/ 2  ♦  n2/2  -  b 
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l  ’) 


Z  -»N 

V[Z]  *-0 

At  2c 

U[Z]  4-0 

It  2r 

J  -» I 

£t  0 

C[I,J]  *-0 


U[I]  ♦  «-A[Z,K]  *A[Z,K»1] 
V[I]  ♦  «-  B[K,Z]  *»[K-1#Z] 
J  -»* 


C[Z,J]  «■  ♦-  (A(I,K]  ♦  B(K-1,J])  *  UCZ.K-1]  ♦  »(M]> 


Z  -4  H 
J  -» I 

C[Z,J]  -  «-0[X]  +f[J] 


..;'3  ■' 
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4.  Ccablnlnc  end  coplo*  of  (1)  to  coaputo  B  4-D  *  (A  *  B)t 


I  -»« 


U[I]  4-  0 

■ 

Utl]  ♦  «-  A[I,K]  ♦  A[I,X-1] 


K  il 


J  -4  H 

¥  4-0 

Kin 


V  +  +-B[K,J]  *  B[R-1,J] 
I  -»  ■ 


At  0 
•t  c 
Ct  c 


CCI.J]  4--u[I]  -  ¥ 

■ 

CCI.J]  ♦  4-  CA[I.K]  ♦  1CK-1.J))  *  UCI.K-1]  ♦  KR.J]) 


K 


I  -*■ 


wtl]  4-0 

* 

W[I]  ♦  4-D[I.X]  *  DCI.B-1] 


x  i« 


Dt  0 
Ct  c 

I*  c 


J  -«■ 


X  4-0 

■  2  > 

X 

X  4-  4-C(X.J]  *  C[K-1.J] 

I  -»9 

*Ci.J)*--wti]-x 

X  il 

*[I,J]  ♦  4-  (D[J.K]  ♦  CCK-1.J])  *  CDCI.K-1]  ♦  CCX.J]) 


APPENDIX  II 


r  I  VLl D  TLj  Uii  ILl  IUj  i  * 


V»  present  bars  s  progrsn.  written  in  AIL,®  Which  dm coetrntee  the 
"leaves-ln"  algorlthn.  Ihis  algorlthn  describes  how  a  given  expression's 
perse-tree  can  best  bs  conputsd  by  alt-tree  associated  algorlth—  (AAIA's) 
Inch  AAXA  is  grown  bp  parallel  connection  fron  a  set  of  IBfT's,  represen- 
tint  •  set  of  els— nfsry  algorlth—,  which  the  user  supplies.  The  "best" 
■sthod  of  conputing  the  given  expression  is  that  composition  of  AAIA's 
Which  uses  the  fewest  arrays  for  co— uni  eating  results  fron  one  AAIA  to 
inputs  of  another. 

The  inputs  to  the  progran  describe  the  HIT 'a  tree  structure,  and 
the  shapes  and  operators  associated  with  their  nodes.  Also,  the  struc¬ 
ture  of  the  expression's  parse-tree  is  given.  The  result  is  a  list  of 
a lg- tress.  ^Aiose  AAIA's  are  a  series  of  eleasntarp  algorlth—  which,  exec¬ 
uted  in  the  given  order,  produce  the  required  expression  veins.  Each  alg- 
tree  is  described  by  listing  the  nodes  in  the  tree  Which  it  includes,  the 
HPT-  rooted  at  each  node,  and  the  interned  late  array  assigned  to  hold  each 
input,  and  the  result.  All  UK's  listed  in  one  alg-trse  axe  to  be  fused. 
The  root  of  the  alg-tree  is  always  listed  first,  end  is  associated  with 
the  nunber  of  the  inter— diets  array  which  is  to  hold  the  AAIA's  result. 

As  —  exaaple.  we  describe  the  EKPT's  of  the  I3  algorlth—.  and 
their  opt!— 1  assign— at  to  the  parse-tree  of  the  expression  (A*>)  *  (C*D) 

HIT's:  (1)  (2)  (3) 


Uses  one  teoporary  to  hold 
the  final  result. 


AFL  is  a  conversational  language,  developed  by  K.  E.  Iverson.  L.  M. 
Breed,  and  R.  H.  Lath  veil  for  the  IBM  360/50.  The  language  is  described 
in  [2j. 
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The  Internal  structure  of  the  APL  progrsm  is  notable  for  it*  um  of  the 
effort-limiting  results  w  presented.  The  "core"  of  the  smthod  lies  in 
the  comparison  of  generated  fv-sets,  We  retain  a  "current  value"  of  n(x) 
throughout  our  generation  of  fv-sets  at  node  x.  Since  w  know  that  n(x) 

Is  the  only  element  of  the  1-set  of  each  shape-set,  w  do  not  copy  It. 
Furthermore,  space  for  only  one  member  of  each  non<0  shape -set  (the  single 
retained  0-set  member)  Is  reserved.  As  an  fv-set  Is  created.  It  Is  tested 
to  see  If  It  reduces  the  current  value  of  n(x),  and.  If  It  Is  a  0-set  mem¬ 
ber,  to  see  If  It  will  be  retained  in  the  0-set.  These  comparisons  never 
require  the  actual  fv-set  cosqparlson  algorithm.  Because  of  the  way  we 
represent  fv-sets,  and  retain  values  n(A),  for  certain  fv-sets  A,  the  com¬ 
parisons  are  between  single  Integers  only. 

External  Representations: 

(1)  Inputs: 

(a)  Tree  structure: 

The  structure  of  a  tree  Is  Input  In  a  single  vector 
called  'FATHERS'.  FATHERS[1]  »  J,  where  node  1  hat  father 
J  In  the  tree.  In  labeling  the  nodes  of  a  tree,  die  father 
of  1  must  be  given  a  number  greater  than  1,  so  that  FATHERS 
must  satisfy  FA1HERS[1>1,  Furthermore,  left  siblings  must 
be  nimbered  less  than  their  right  siblings.  (If  these 
rules  are  violated,  the  results  are  unpredlcatable.) 

The  FATHERS -entry  for  the  root  of  the  tree  Is  not 
part  of  the  tree-structure.  It  must  be  present,  but  Its 
value  carries  non- structural  Information. 

(b)  Mode  labels: 

The  Internal  label  of  a  node  1,  Is  given  by  a  code 
number,  K^,  In  the  1th  position  of  vector  OPERATORS. 

Codes  are  used  to  Indicate  the  operator  (*  or  +)  for 
Intermediate  nodes,  or,  in  the  case  of  leaves  of  EEPT's, 
to  Indicate  shape  (fi,  r,  or  c).  Since  the  root-node  of 


an  EEPT  has  an  operator,  the  EEPT's  root-shape  la  coded 
in  place  of  the  FATHER  of  the  EEPT’a  root. 

Codes : 

0  -  variable 
1  -  * 

2  -  ♦  % 

t  -  0 

2  -  r 

3  -  c 

(2)  To  Input  EEPT'a  type: 

EEPTS 

The  program  responds  with  alternate  requests  for  FATHERS,  and 
OPERATORS,  which  should  be  answered  with  the  appropriate  vectors. 
Each  pair  of  requests  allows  the  input  of  another  EEPT.  EEPT's 
are  Identified  by  the  order  of  their  input,  the  first  one  being 
given  an  identification  nuaber  of  '1'.  Any  scalar  or  slngle- 
t lament  vector  typed  in  response  to  FATHERS  is  ignored,  and 
terminates  the  in  it  of  EEPT's. 

To  execute  the  leaves- in  algorithm,  after  EEPT's  have  been  input, 
type: 

TREE 

The  response  is  a  FATHERS,  OPERATORS  request-pair  ihlch  should  be 
answered  with  the  parse-tree  description.  The  leaves-ln  algorithm 
then  executes. 

(3)  Output: 

The  output  is  a  sequence  of  alg-trees,  assignments  of  EEPT's 
to  the  nodes  of  the  parse -tree.  Each  alg-tree  is  given  In  three 
vectors: 

N0DES[1]  -  lists  the  node-number  of  the  node  in  the  parse-tree 
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associated  with  the  root  of  each  KEPT,  and  with  each  EEPT  leaf. 
K0DES[1]  always  lists  the  root  nf  the  alg-tree. 

KKPT8[1]  -  the  Identifying  nunber  of  the  EEPT  associated  with 
node  NODSS[X]  in  this  alg-tree. 

TEMP8[I]  -  the  rnaeber  of  a  temporary  Matrix  assigned  to  hold 
a  result  associated  with  node  N0DES[I],  or  sero. 

The  sequence  in  which  alg-trees  are  listed  is  the  seqv*?*:e 
in  irtilch  the  algorithm  they  represent  are  to  execute.  This 
ensures  that  an  Interned  late  aatrlx  used  as  input  to  a  given  AATA 
is  computed  before  it  is  accessed. 

led  exessples: 

A.  Input  of  EEPT'si 

1.  Pt  3  3  2 
Ot  2  1  1 

Here  the  tree  is  labeled 


Ite  structure,  given  In  Ft  (f ethers),  is 
3  3  _ 

signifying  that  nodes  1  and  2  have  father  '3',  and  reserving 
the  last  position  of  the  vector  (which  always  represents  the 
"father”  of  the  tree's  root)  for  other  Infomation. 

The  operators  are _ 1 ,  i.e.,  node  3  has  operator  1  ■  *, 

and  the  other  nodes,  known  to  be  leaves  of  the  tree,  have  no 
operators. 

The  additional  infomation  given  in  the  two  vectors 
represents  the  shape  associated  with  each  node: 
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Shages: 

Fi _ 2 

0:  2  1  _ 

Thus,  nodes  3  and  1  have  shape  2  -  r,  while  node  2  has  shape  1  ■  Q. 
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B»  Input  exsap le:  Entry  of  EEPT's, 

The  indented  line  following  the  □:  line  is  typed  by  the  user, 
not  the  computer. 


EEPTS 


0: 

3  3  2 
OPERATORS 
□: 

2  11 


□: 

3  3  3 
OPERATORS 
□: 

3  1 


FATHERS 

□: 

3  3  1 
OPERATORS 
□: 

3  2  1 


FATHERS 

□: 

3  3  2 
OPERATORS 
□: 


A 


A 


o 
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C,  Output  interpret* tlon 
Example  1: 


TREE 

FATHERS 

n* 

3  3  0 
OPERATORS 
□: 

0  0  1 

ASSIGNED  1  MATRICES: 
NODES :  321 
EEVTS  s  3  0  0 

TEMPS:  100 


die  input,  FATHERS  and  OPERATORS,  deacrlbea  the  tree  K: 


The  output  (a  ter  ting  with  the  line  reading  "ASSIGNED  1  MATRICES") 
girt  a  the  maaber  of  MATRICES  (2- err  ay  a;  needed  In  computing  the  root 
of  E.  Ihe  remainder  of  the  output  preaente  the  alg-tree(a)  uhoae 
AATA(a)  correctly  compute  E.  In  thla  caaa,  a  a  ingle  alg-tree  aufficea: 

intermediate  array  T1 
holda  the  reeult. 


1 


E 


EE  PI'  3 
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Example  2: 


TREE 

FATHERS 

0: 

5  5  6  6  7  7  0 
OPERATORS 
□s 

0  0  0  0  1  1  1 
ASSIGNED  1  MATRICES: 

NODES :  7  6  4  3  5  2  1 

EEPTS:  3100200 
TEt’fPS :  1  0  0  0  0  0  0 


cree  E: 


one  elf  tree  i 


KEPT* a  ueadt 
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Kt—p1«  3  i 


TREE 

FATHERS 

0: 

7  7  8  8  9  9  10  10  13  13  14  14  15  15  0 
OPERATORS 
0: 

000000121200111 
ASSIGNED  2  MTRICESi 

NODES'.  13  10  843721J65 
EEPTSi  34400100200 
TEHPSi  10000000000 

NODES  i  15  14  12  11  13 

EEPTSi  31000 
TEHPSi  2  0  0  0  1 


tr««t 


140 


E.  Inf  rnal  Representations  of  Inf  ft; 

(1)  FriggKMlut  lit. 

The  contents  of  each  fringe-value  eet  S  is  represented  as 
e  table  of  f(S,v) ,  v  •  1,  2,  3,  4. 

Thus,  the  fv-set 

S  •  (3  3  2  1  1  0),  where 
f(S,4)  -  0 
f(S,3)  -  4 
(S,2)  -  4 
f(S,l)  -  5 

is  recorded  as  the  API  vector  3  4  4  0. 

Two  fv-sets  can  be  Joined  to  produce  a  third  In  one  A PL 
eta  tenant,  based  on  the  coeponent-by-coeponent  addition 
of  vectors.  The  comparison  theorem  la  most  easily  applied 
in  this  form,  as  well. 

(2)  The  collection  of  fv-set  a  associated  with  e  node. 

Each  node  Is  associated  with  3  shape-sets,  one 
each  for  the  codeable  shapes:  q,  r,  c.  Matrix  SHAPESETfl;  j] 
holds  the  "teg"  of  node  I's  Jth  shape-set.  Each  shape-set 
only  holds  one  fv-set- -either  the  single  0-set  fv-set,  for 
shapes  r  and  c,  or  the  fv-set  whose  single  member  Is  the 
"value"  of  the  node  I,  n(I),  for  shape  q.  These  are  the 
only  distinct  fv-sets  which  need  representation  at  each 
node,  n's  fv-set  may  be  selected  ae  pert  of  shape-set  r 
or  c,  end  treated  as  the  only  1-set  member.  After  ell  fv- 
sets  of  s  node  are  generated  and  compared,  the  surviving 
fv-sets  ere  pieced  In  table  PVSET[l;].  SHAPESET  fl;j]  holds 
the  Index  K  in  PVSET  of  the  single  fv-set  which  Is  repre¬ 
sented  In  shape-set  J  of  node  I.  PVSET [K; 3  holds  the 
4-element  vector  representing  that  fv-set. 


(3)  The  parse- tree. 

A  parse- tree  structure  la  represented  Internally  by 
"downward"  pointing  links,  at  wall  at  by  noda  ordar.  Rode  1 
of  tha  parts- traa  la  asaoclatad  with  a  vac  tor,  SOMfcj], 
giving  tha  noda  number  of  noda  Z'a  Jth  ton.  OFtfl]  glvaa 
tha  coda  for  noda  Z'e  operator.  Modat  ara  nunbar ad  such 
that  aaeh  antry  of  SOW[z,j]  <1.  At  a  result,  wa  can 
visit  nodat  of  tha  paraa-traa  In  Increasing  ordar  of  noda 
nunbar  with  assurance  that  each  node's  da  sc  and  ante  have 
been  vial tad  before  that  noda. 

(4)  Tags. 

Inch  fv-aat,  when  stored  la  FVSlTfl;],  la  asaoclatad 
with  tags,  giving  tha  PVIIT  indices  of  tha  fv-sats  fron 
which  fv-aat  I  was  created.  In  addition,  other  Information 
about  the  fv-aet  la  stored  in  the  asm  array. 

Global  Tables 

Node  II  of  the  expression's  paraa-traa  la  associated  with: 

KM[R;l]  -  tha  number  of  the  Ith  son  of  node  II 

OVR[r]  »  tha  ^operator"  of  node  I 

SHAfKSKT[H;S]  ■  the  TVS  IT  index  of  tha  single  surviving  aanber 
of  tha  shape-sat  I  of  node  I.  This  fv-set  Is 
a  water  of  (Vest  of  tha  shape-set  If  S  la  *r' 
or  *c'.  When  I  Is  'r'  or  'c',  1-set  Is  given 
by  8HAKSKT[R;aMGA]. 

Tv-set  I  Is  associated  with: 

TAG [i ; IALC]  •  the  nunbar  of  tha  HPT  which  genaratad  fv-aat  I, 
that  la,  tha  UPT  forming  tha  base  of  tha  alg- 
tree  whose  fringe- sat  Z  represents. 

TAG[X;ZN0D|]  ■  the  node  of  one  of  whose  shape-sets  Z  Is  a  water. 

TAG fl ; XVAlJ  ■  tha  value  of  fv-sst  Z,  l.e.,  ^x(PVSET[l; j]) 
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TAG  [i;  ISAM]  ■  Used  during  thn  output- phase  to  lndlento 

which  node  of  I'a  fringe- oat  the  result-set 
of  the  alg-tree  con  ogroo  with. 

MATftfl]  ■  During  tho  output-phase,  holds  the  Intermediate 
array  number  of  the  array  which  la  assigned  to 
hold  the  result-aet  of  the  alg-tree 'e  algorithm. 

TAG  [i ;  ISOM+k]  ■  the  PV8BT  index  of  the  Kth  "eon”  (generating 
fv-set)  of  fv-set  I. 

FVSET[l;v]  -  holds  fv-set  I's  f(I,V)  value. 

The  Ith  KEPT  read  In  by  EEPT8  Is  associated  with: 

EEPT[l;l]  •  the  nuaber  of  the  node  In  the  EEiT- forest 
representing  I's  root. 

EEPrfl;2]  ■  the  root -ah  ape  of  EIPT  I. 

EEPT[l;3]  ■  the  number  of  leaves  of  EIPT  I. 

The  "forest"  cf  EEPT's  stores  all  nodes  of  all  KIPT's. 
Each  node  Is  assigned  a  ntnber  distinct  from  the  numbers 
assigned  any  other  EEPT's  nodes  by  "relocating"  the  mabere 
assigned  nodes  on  Input.  A  given  node,  I,  of  this  forest 
Is  associated  with  the  following  Information. 

SONEfliC]  ■  the  forest  node  number  of  the  Eth  son  of  node  I. 

If  I  Is  a  leaf,  SOKSfljl]  -  0. 

-  the  operator  of  node  I,  where  I  Is  e  leaf.  If  I 
la  a  leaf  of  soma  KEPT,  OPElfl]  gives  I's  leaf-shape. 


OPtEfl] 
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F.  Description  of  program  operation 


This  "main  routine",  keyboard  activated,  accepts  an  expression' 
parse-tree,  using  DRUB.  It  initialises  and  structures  the  arrays 
needed,  and  calls  ASSIGN  to  Initiate  phase  1.  On  return,  it  prints 
the  number  of  intermediate  matrices  ASSIGN  finds  to  be  needed,  and 
calls  ALGOR  to  collect  and  print  the  alg-tree  assigned. 


virozwcnv 

7  TREE 

Cl]  INTREE  0 

[2]  NODES+vF 

[3]  MXSJb-  3 

t*0  LFAFSET*-  “111 
[5]  HXFVS+mSH*NODF.S 
C  63  HXVL+M 
C7]  IALG+1 
C  83  INODE-*-! 

C9]  IVAL+  3 

CIO]  ISMt+M 

CU]  ISOIMi 

C12]  NXTG+ISON+MXLV 

C13]  SHAPESET*-(  NODES  ,MXSR)pO 

[14]  FVSET*-(MXFVS,MXVL)pO 

CIS]  TAO+{MXr/StMXTG)oO 

C16]  FVSL+ 1 

C17]  XPSS+  112131 

CIS]  SOfbS 

C19]  OPR+O 

C20]  OMEGA*- 1 

C  21  ]  ASSIGN 

C22]  MA  TR*MXFVSp  0 

[23]  FVR+SRAPESETL  NUDES ;  1  ] 

C24]  VALUE+TAGtFVRiIVALl 

[25]  AVAIL+VALUEp  0 

[26]  (>(  *  ASSIGNED  '\VALUEi'  MATRICES: ' ) 

[27]  ALGOR  FVR 
7 
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EEPTS 


This  "main  routine",  keyboard  activated,  reads  as  many  EEPT's 
as  the  user  cares  to  supply.  All  are  recorded  In  the  same  tables, 
SONE,  to  hold  the  "sons"  tree  representation,  and  OPRE,  to  hold  the 
trees'  operators.  Each  EEPT  occupies  a  different  (contiguous)  set  of 
Indices  In  these  tables,  with  the  relocation  argument  of  INTREE  used 
to  adjust  the  values  stored  Into  a  true  list  structure.  A  vector 
EEPT[I;]  holds  3  Items  of  Information  about  EEPT  I:  (1)  the  Index 
In  SONE  and  OPRE  of  Its  root;  (2)  Its  root-shape;  (3)  the  number  of 
leaves  It  has. 


vffEPTj  rri]v 

V  EFPTS-.Q 

[1]  FFPT+\0 

[2]  rxr.v+o 

[3]  SOUF 0  0  pO 

[4]  OPFF+- \  0 

I  cj  ]  F.FP2 :  IPTRFFpOPRE 

C  6  3  +FFPl*\lZp,0 

[  7  ]  FEP3 :  OPRF+(  (  p  OPRF)+pO  )p(0PRF,0) 

[81  SONFM  ((pSOEFm ]+(pS )[ 1 ] ) , ( pS)[ 2] )p ( (  ,SOFF) .  ( .5) ) 

[9]  LKS«-+/a/[2]S=0 

[10]  PXLVHIXm  LVE 

[11]  FFPr+-FFPT.(pOPnF)  ,F[  oF]  ,LVS 
[121  -*FFP2 

[13]  EEP1  :FEPT*-(  ( (pFFPT)*3) , 3)pZ7 FPT 

V 
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ASSIGN 

ASSIGN  Implement i  the  leavas-in-algorithm,  as  described . 

ASSIGN  vlaita  each  node  of  die  parse- tree,  T,  in  order  of  their 
node-numbers,  and  tries  to  match  each  KEPT  with  that  node.  When  a 
matching  KEPT  is  located,  NEW7VS  is  used  to  update  B,  BT,  NV,  and 
NT,  the  "surviving"  best  fv-sets.  After  all  KEPT 'a  have  been  tried, 
the  surviving  fv-sets  are  copied  into  FVSET  and  TAG.  SHAPE  SET  gives 
their  indices,  or  is  0  for  empty  shape-sets. 


VASSICMU. JV 
V  ASSIGN  it!  iFiTSTilSET 

[I]  N+l 

[23  FVSL+ 1 

[3]  ASG*i+ASGlx\A/SOt?ZNil=0 

[4]  F«- 1 

[5]  NT+O 

[6]  BT^(MXTGtHXSN )p0 

[7]  B+(HXVLtMXSH)  pO 

[8]  NV+U  i0 

[9]  ASG2-.F  NEWFVS  N  MATCH  FFPTLF ill 

[10] 

[II]  +ASG2x\FS(pFFPT)ll] 

[12]  BTii(tJVsr/lllD)/\MXStn+0 

[13]  8[;1W  NV 

[14]  BT[ipA’r;l]«tfr 

[15]  TST+3TUS011+1  i  >0 

[16]  BIT.  INODE  i]-*-!! 

[17]  ISET+-TST/ \MXSF! 

[18]  FVSFTLFVSL+xpISETi'}*  2  1  $S[  ;IS£T] 

[19]  TAGlFVSL+\pISETi}+  2  1  bflTT  iISFTl 

[20]  SHAPFSFTtNi  ]+-TST\FVSL+  \pISFT 

[21]  FVSL+FVST+pISET 

[22]  +ASG3 

[23]  ASGUFVSL+FVSL+l 

[  24  ]  TAGlFVSL  iWODFl+N 

[25]  SUAPFSElVli'l+LEAFSET*FVSL 

[26]  ASGZ-.tW+l 

[27]  -*ASGHx  \  NiNODFS 
7 
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E  NEWFVS  X 

One  of  the  central  phase-one  routines,  NEWFVS  generates  all 
fv-sets  which  match  EEPT  E  at  some  node  N  of  the  expression  parse-tree, 

T.  X  describes  each  node  of  T  matching  a  leaf  of  E.  NEWFVS  generates 
combinations  of  fv-sets  which  are  members  of  the  proper  shape-set  of 
the  matching  nodes  by  using  the  base-2  representation  of  a  number  it 
Increments  from  0  to  2*pX.  When  a  position  of  this  vector  is  0,  it 
selects  the  single  0-set  member  of  the  shape-set.  When  1,  it  selects 
the  1-set  member.  A  translation  vector,  XFSS,  translates  1-set  requests 
into  Q-shape-sot  requests,  since  1-aets  are  represented  only  implicitly. 
Various  tests  exclude  incorrect  combinations,  such  as  1-set  of  Q-shape-set, 
or  requests  for  an  empty  shape-set.  Each  generated  combination  is 
JOINed,  and  tested  against  the  previously- surviving  best  fv-sets  of 
the  shape-set  iriiose  name  is  root-shape (E).  Tags,  including  values  of 
the  fv-sets,  and  "ISAM"  are  computed  here.  ISAM  designates  one  node 
(by  fv-set  number)  whose  access-shape  agrees  with  root-shape(E),  and 
which  can  consequently  be  assigned  a  2-ar^ay  which  is  also  assigned  to 
hold  the  result  of  the  algorithm  rooted  at  this  node.  The  principle 
outputs  of  NEWFVS  are: 

B[;R]  give  s  the  fv-set  surviving  in  shape-set  R,  R  ft 
BT[;R]  gives  the  tags  of  B[;R] 

NV  is  a  scalar,  holding  N(S*)  of  the  fv-set  S*  of  ssmllest 
value  in  shape-sets  of  this  node. 

NT  holds  a  copy  of  the  tags  of  that  fv-set  whose  value  appears 
in  NV. 
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VATCWKSCmv 

V  E  NEWFVS  XiMihFVS 
Cl]  /Mp*)[l] 

[2]  -K>Mtf*0 

[3]  MD+2*MuM 

[4]  ML+x/MD 

[5]  1*0 

[6]  NPliNC+HDrl 

[7]  ■*NF2*\v/(Xt 

[8]  FVS*-  1  1  V/MPESmi[;23;JrPS5CWCt‘l<-2*Xt;l]]] 

[9]  m^mxl-2*(#T5S0)Ajrt  jl]*CWBG4 

Cio]  -*W2*iv/mso 

[11]  C*JOIBS  FVS 

[12]  3M«-(rt:i]«araM)AM>i 

[13]  SMI*SM\l 

[14]  R+EEPTtE  ;2] 

[15]  GH*/SM)*R*OMEGA 

[16]  SMS*0 

[17]  +NFB*\G*0 

[18]  SMS*-FVSlSMn 

[19]  SMMAGtSMSiISAtn 

[20]  +itF6*\SHT*0 

[21]  SMS*SMT 

[22]  HFS:CV*f/C 

[23]  -*/7F3M(Ctl]*CV)v<;*l 

[24]  *NPS*\CWfV 

[25]  0[{j?]«C 

[26]  Bit  \thISORiR}+E,  0  0  , SMS, FVS 

[27]  CV*CV+1 

[28]  ltP3:+NFS*\CV>lfV 

[29]  MV*CV 

[30]  HTH-E),0,BV,SMStPVS 

[31]  HF5iI+I 

[32]  NP2iI+I+l 

[33]  •+NP1*\I<ML 

V 
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N  MATCH  E 

Here  N  la  a  node  number  In  the  exj-reasion  parse- tree,  and  E  gives 
the  root  o£  an  EEPT.  The  value  of  MATCH  is  an  APL  matrix,  Z.  Z[I;  j 
describes  the  node  matching  E's  Ith  leaf.  Z[I;1]  gives  the  leaf-shape; 
Z[I;2]  the  expression  parse-tree  node's  number.  The  recursion  is  per¬ 
formed  by  MATCH! .  MATCH  restructures  MATCH! 's  result,  irtilch  is  an 
APL  vector,  into  the  more  convenient  form  of  an  APL  matrix. 

vrATcmnv 

V  MATCH  F\V 

[1]  tMI  MATCH  1  F 

[2]  Z«-(((pf/)*2),2)pV 

[31  -K)*iA/zr  ;2]*0 

[4]  Z«-( 0  2)p0 

V 


N  MATCH!  E 

Recursively  matches  node  N  of  the  parse-tree  with  node  E  of  an 
EEPT.  Its  value  describes  the  list  of  nodes  of  the  parse-tree  ifclch 
match  leaves  of  the  sub- KEPT  rooted  at  E. 


I 


WATCH1  [P]7 
7  Z+t!  MATCH 1  P;7;«7 

[1]  Z«mO 

[2]  -K)*tP*0 

[3]  ■*/-W1m/.’>0 

r 4]  rrrci'.z*  o  o 

[5]  -K) 

[6]  UTCUZ+OPnnn.H 

[7]  -K)ma/S0//F[F;]=O 

[8] 

[9]  .MpSOPK 2] 

[10]  7<*-l 

[11]  Z-iO 

[12]  /fiT2 :  Z«-Z  ,S0A'[  V ;  7],’  7!  7CP1  m'T[P;7] 

[13]  7«-7+l 

[14]  +ttTC?*\ISJ 
V 
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A  JOIN  B 

A  and  B  are  fv-aets  repreaented  by  APL  vector*.  Ihe  value  o£ 
'A  JOIN  B'  la  the  APL  vector  repreaentatlon  of  fv-aet  C,  where 
C  -  A  U  B. 


Cl] 


7JOZWC03V 
7  C+A  JOIN  B 
C+A  +B- (  0*A  xB )  x  ( t  pB )  - 1 
7 


JOINS  X 

X  la  a  vector  of  fv-aet  lndlcea.  The  value  of  JOINS  X  la 
JOD(/FVSXT[X;]v  If  thla  could  be  written  in  APL,  l.e.,  an  fv-aet 
vector  repreaentlng  U  FVSKT[X[I];]« 


VJOINSim? 

7  V+JOINS  X ;7 
Cl]  1*1 
C2]  VH) 

[3]  JNS1 :  V*V  JOIN  FVSETlXin  i  ] 

[4]  Io-I*l 

[5]  oJI/SUxIipX 
7 


FVST  X 

X  la  a  acalar.  FVST  X  haa  aa  value  an  APL- vac tor  repreaentatlon 
of  the  a Ingle- Integer  fv-aet  (X). 

vmrcrov 

7  Zo-FVST  X 

[1]  Z+(\X)AMXVL-X)u  0 
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Output  phase. 

Once  each  node  hee  been  visited,  and  Its  shape-sets  and  their  member 
fv-sets  have  been  computed,  the  list  o£  alg- trees  represented  must  be 
produced.  In  the  course  of  visiting  each  node  In  the  expression" i  parse- 
tree,  each  node  has  been  "evaluated".  Furthermore,  "tags",  tracing  the 
ancestry  of  each  fv-set  have  been  recorded.  These  tags  represent  alg-treas 
in  a  true  "sons"  tree  representation.  The  output-phase  proceeds  from  the 
root-node  of  the  parse- tree  to  the  leaves,  collecting  each  alg-tree, 
ordering  Its  fringe-set,  and  collecting  the  alg- trees  rooted  at  each  of 
the  fringe-set  nodes.  Recursion  reverses  the  printing  order  so  that 
alg-trees  rooted  at  fringe-set  members  of  alg-tree  A  print  before  A. 
Intermediate  2-arrays  are  assigned  "linearly".  Each  2-array  Is  given  an 
"available"  Indicator.  As  each  alg-tree  Is  printed,  an  available  2-array 
4s  assigned  for  Its  result  (root-node),  and  the  2-arrays  assigned  Its 
fringe-set  nodes  are  smde  available. 

Output  routines: 

ALGOR  X 

X  gives  the  FVSET  Index  of  an  fv-set  idilch  Is  the  "root"  of  an 
alg-tree.  ALGOR  collects  the  fringe-set  fv-sets,  S[I],  using  COLLECT, 
orders  them  by  fv-set  value,  using  ORDER,  then  calls  Itself  on  each 
of  the  S[J[I]]  to  compute  and  print  the  alg-trees  uhlch  are  rooted  at 
each  Input  to  X's  alg-trees.  It  then  prints  X's  alg-tree. 


VALGORIDN 
V  ALGOR  XiFiG’,1 

[1]  I+TAGlXilALG) 

[2]  -MLG3MK0 

[3]  ERROR  IK  TAG 

[*♦]  ALG2:TAGIX',IALC1<-I 

[5]  F+COLLECT  X 

[6]  G+ORDER  F 

[7]  J>1 

[8]  MLC2 

[9]  ALGliALGOR  rtGCl]] 

[10]  I-Ifl 

[11]  ALG2i+ALGl*\lSpG 

[12]  PRT  F 

[13]  TAG[XiIALC]+<> 
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COLLECT  X 

The  value  o£  COLLECT  X  la  a  vector.  Hating  all  fv-aeta  In  the 
connection-set  and  fringe-set  of  the  alg-tree  rooted  at  the  fv-set  whose 
Index  la  X.  An  fv-aet  In  the  fringe-set  la  Identifiable  because  each 
such  fv-set  Y  la  the  "value"  of  scae  node,  and  la  marked  during  phase  1 
by  recording  a  negative  TAG[Y;IALG]  entry  for  It.  Also,  only  these 
fv-sets  have  non-zero  TAG[Y;IVAL]  entries. 


^COLLECT tD]7 
7  Z+COLLECT  X-,AiI 

[1]  Z+tX 

[2]  A+TAGlXiIALC] 

[3]  -K)M;4S0 

[4]  I+£EPKAi2) 

[5]  COLUZ+-Z, COLLECT  TAClXilSON+Il 

[6]  I-I-l 

[7]  +COL1*\I>0 
7 


Has  as  value  a  vector,  Y,  which  lists  certain  Indices  In  vector  X. 
Y  satisfies:  IAG[X[Y[I]];IVAL]  >  T*G[X[Y[I+1]];IVAL] 

and  TAC[X[Y[I]];IVAL]  >  0 

Thus,  according  to  our  rule  for  ordering  the  confutation  of  a  fringe- 
set,  fv-oets  X  should  be  confuted  In  the  order  given  by  Y,  Note  that 
only  fringe-set  msabers  of  X  are  listed  In  Y.  Tv- seta  X[I]  which 
are  In  the  connection  set  of  an  alg-tree,  or  are  associated  with  leaves 
of  the  parse-tree,  have  zero  IAC[X[I];IVAL]  entries,  and  hence  are  not 
listed  in  Y. 


VORDERl  [1]7 
7  C+ORDFR  X\T\V 
Cl]  O)  0 

[2]  V+TAGU\IVAl] 

[3]  Hl]X3 

[4]  -*ORD2 

[5]  ORDliT+Vxt/V 

[6]  F[7]<K) 

[7]  G*C,T 

[8]  ORD2i-*ORDU\V/{/>0 
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PRT  X 

PRT  prints  the  alg-tree  whose  root  fv-set,  connection  fv-sets  and 
fringe  fv-sets  are  fcleted,  represented  by  their  7VSET  Indices,  In  vector 
X.  PRT  also  assigns  Intermediate  2-array  numbers  to  the  root  fv-set 's 
node,  and  frees  the  2-arrays  assigned  the  fringe  fv-sets,  using  NEWM 
and  FREEM. 


VP/7T[D]7 

V  PUT  X;Y 

[ 1 ]  X  ' FODFL 7 :  •  \TAC[X\ IUODF] ) 

[2]  O rEEPTS:  * \TAGIX\IALG]) 

[3]  r-*[l] 

[4]  SAM+TAnU-JCAr] 

[5]  +PR\*\SAM*0 

[  6  ]  MATPl  Y]+MA  TRl  RAM) 

[7]  •♦P/72 

[8]  Pm-JfATRlYl-NFtff 
L91  P/72 : FPFFM  MATRIX  1 

[10]  0(  ' TFMPS :  -MATRIX)) 

[11]  l>»0 

V 


NEWM 

NEWM's  value  la  the  Index  of  the  first  "available"  intermediate 
2-array.  NEWM  also  sets  that  2-array  unavailable. 


v//pw/trjv 

7  ZH1FWM 

[1]  Z+AVAIU  0 

[2]  AVAIULZ\+ 1 

[3]  Z+--7 


V 


153 


FREEM  X 

Frees  (makes  available)  intermediate  matrix  X. 


VFREEtltm 
7  FREFM  X;IiT 
Cl]  1*1 

[2]  -+FR2 

[3]  FR1:T+-Xin 

[4]  *FR2*\T&Q 

[5]  AVAILIT ]*v 

[6]  FR2iI*I+l 

C  7  3  +FRl*\ISpX 

V 


Input  subroutine 
INTREE  T 

INTREE  accepts  a  parse-tree  or  KEPT  from  the  keyboard.  Its 
output  is  the  "FATHERS"  vector  (in  F) ,  the  "OPERATORS"  vector  (in  0), 
and  a  generated  matrix  of  sons  (in  S),  giving  the  "reverse"  links 
of  the  FATHER  vector.  Its  single  argiaasnt,  T,  is  used  to  "relocate" 
the  list  structure  produced  in  S  and  F,  so  that  the  value  actually 
stored  in  F  satisfies  F[I]  ■  T+J,  where  node  J  is  node  I's  father, 
according  to  the  input  structure. 


VJUTREFiWW 
V  I11TREE  TihJiM 
Cl]  1}+' FATHERS' 

C  2]  F* C 

C  3]  0*F 

C  4]  -K)m1  ip,F 

C  5]  MXSNS-2 

C6]  SH(.pF).MXSFS)pO 

C7]  0*(pF)pl 

C  8]  J-l 

CO]  IRT2:J*-nn 

CIO]  SlJi0lJ11*I+T 

Cll]  0CJXX7M 

c  12]  nn*J+r 

C 13]  1*1+1 

(14]  ■+IHT2*\I<pF 

CIS]  \>' OPERATORS' 

C 16]  OH' 

[17]  J+0*F 
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Algorithm  optimization  can  be  accomplished  by  an  exhaustive  search  over 
alternative  algorithms  for  performing  some  programming  task.  The  resulting 
algorithms  are  optimum  only  with  respect  to  a  program  technology — the  particular 
set  of  alternatives  investigated.  Thus,  larger  program  technologies  can  be 
expected  to  yield  better  algorithms.  This  thesis  contributes  to  the  production 
of  optimum  algorithms  in  two  ways.  First,  a  technique  ("loop-fusion")  was 
developed  for  producing  new  algorithms  equivalent  to  old  algorithms,  and  thus 
expanding  program  technologies.  Second,  a  technique  ("comparison")  is  described 
which  reduces  the  effort  required  by  certain  exhaustive  seraches  over  "well- 
structured"  search  spaces.  These  techniques  are  applied  to  the  production  of 
algorithms  for  evaluating  matrix  arithmetic  expresssions  (MAE).  (The  operators, 

+  and  *,  in  such  arithmetic  expressions  are  Interpreted  as  matrix  addition  and 
multiplication,  respectively.)  A  method  is  described  for  producing,  'Jor  any  MAE, 
an  algorithm  for  its  evaluation  which  requires  fewest  arrays  for  holding  N  by  N 
matrices,  while  not  requiring  more  execution  time  than  the  "standard"  MAE 
evaluation  algorithm.  Althoup*'  the  algorithm-production  method  used  is  basically 
an  exhaustive-search  over  a  large  space  of  program  alternatives  for  each 
subexpression  of  the  given  MAE,  the  effort  this  method  requires  grows  only 
linearly  with  the  number  of  operators  in  the  given  expression. 


